Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning

Size: px
Start display at page:

Download "Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning"

Transcription

1 1 Class Mean Vector Component and Discriminant Analysis for Kernel Suspace Learning Alexandros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark arxiv: v1 [cs.lg] 14 Dec 2018 Astract he kernel matrix used in kernel methods encodes all the information required for solving complex nonlinear prolems defined on data representations in the input space using simple, ut implicitly defined, solutions. Spectral analysis on the kernel matrix defines an explicit nonlinear mapping of the input data representations to a suspace of the kernel space, which can e used for directly applying linear methods. However, the selection of the kernel suspace is crucial for the performance of the proceeding processing steps. In this paper, we propose a component analysis method for kernel-ased dimensionality reduction that optimally preserves the pair-wise distances of the class means in the feature space. We provide extensive analysis on the connection of the proposed criterion to those used in kernel principal component analysis and kernel discriminant analysis, leading to a discriminant analysis version of the proposed method. We illustrate the properties of the proposed approach on real-world data. Index erms Kernel suspace learning, Principal Component Analysis, Kernel Discriminant Analysis, Approximate kernel suspace learning I. INRODUCION Kernel methods are very effective in numerous machine learning prolems, including nonlinear regression, classification, and retrieval. he main idea in kernel-ased learning is to nonlinearly map the original data representations to a feature space of usually) increased dimensionality and solve an equivalent ut simpler) prolem using a simple linear) method for the transformed data. hat is, all the variaility and richness required for solving a complex prolem defined on the original data representations is encoded y the adopted nonlinear mapping. Since for commonly used nonlinear mappings in kernel methods the dimensionality of the feature space is aritrary virtually infinite), the data representations in the feature space are implicitly otained y expressing their pair-wise products stored in the so-called kernel matrix K R N N, where N is the numer of samples forming the prolem at hand. he feature space determined y spectral decomposition of K has een shown to encode several properties of interest: it has een used to define low-dimensional features suitale for linear class discrimination [1], to train linear classifiers capturing nonlinear patterns of the input data [2], to reveal nonlinear data structures in spectral clustering [3] and it has een shown to encode information related to the entropy of the input data distriution [4]. he expressive power of K and its resulting asis motivates us to study its discriminative power as well. In this paper we first propose a kernel matrix component analysis method for kernel-ased dimensionality reduction optimally preserving the pair-wise distances of the class means in the kernel space. We show that proposed criterion also preserves the distances of the class means with respect to the total mean of the data in the kernel space, as well as the Euclidean divergence etween the class distriutions in the input space. We analyze the connection of the proposed criterion with those used in uncentered) kernel principal component analysis and kernel discriminant analysis, providing new insights related to the dimensionality selection process of these two methods. Extensions using approximate kernel matrix definitions are susequently proposed. Finally, exploiting the connection of the proposed method to kernel discriminant analysis, we propose a discriminant analysis method that is ale to produce kernel suspaces the dimensionality of which is not ounded y the numer of classes forming the prolem at hand. Experiments on real-world data illustrate our findings. II. PRELIMINARIES Let us denote y S = {S c } C c=1 a set of D-dimensional vectors, where S c = {x c 1,..., x c N c } is the set of vectors elonging to class c. In kernel-ased learning [2], the samples in S are mapped to the kernel space F y using a nonlinear function φ ), such that x i R D φx i ) F, i = 1,..., N, where N = C c=1 N c. Since the dimensionality of F is aritrary virtually infinite), the data representations in F are not calculated. Instead, the non-linear mapping is implicitly performed using the kernel function expressing dot products etween the data representations in F, i.e. κx i, x j ) = φx i ) φx j ). By applying the kernel function on all training data pairs, the so-called kernel matrix K R N N is calculated. One of the most important properties of the kernel function κ, ) is that it leads to a positive semi-definite psd) kernel matrix K. While the indefinite matrices [5], [6] and general similarity matrices [7] have also een researched, in this paper we will consider only positive semi-definite kernel functions. he importance of kernel methods in Machine Learning comes from the fact that, in the case when a linear method can e expressed ased on dot products of the input data, they can e readily used to devise nonlinear extensions. his is achieved y exploiting the Representer theorem [2] stating that the solution of a linear model in F, e.g. W φ) R F C, can e expressed as a linear comination of the training data, i.e. W φ) = ΦA, where Φ = [φx 1 ),..., φx N )] and

2 2 A R N C is a matrix containing the comination weights. hen, the output of a linear model in F can e calculated y o i = Wφ) φx i) = A k i, where k i R N is a vector having its j-th element equal to [k i ] j = κx j, x i ), j = 1,..., N. hat is, instead of optimizing with respect to the aritrary dimensional W φ), the solution involves the optimization of the comination weights A. Another important aspect of using kernel methods is that they allow us to train models of increased discrimination power [2], [8]. Considering the Vapnik-Chervonenkis VC) dimension of a linear classifier defined on the data representations in the original feature space R D, the numer of samples that can e shattered i.e., correctly classified irrespectively of their arrangement) is equal to D + 1. On the other hand, the VC dimension of a linear classifier defined on the data representations in F is higher. For the most widely used kernel function, i.e. the Gaussian kernel function κx i, x j ) = exp 1 2σ x 2 i x j 2) 2, it is virtually infinite. In practice this means that, under mild assumptions, a linear classifier applied on data representations in F can classify all training data. Using the definition of K = Φ Φ and its psd property, its spectral decomposition leads to Φ = [φ 1,..., φ N ] = Λ 1 2 U, where Λ = diagλ 1,..., λ N ) and U R N N are the eigenvalues and the corresponding eigenvectors of K. hus, an explicit nonlinear mapping from x i R D to φ i R N is defined, such that the d-th dimension of the training data is: [ Φ] d = λ d u d, 1) where λ d is the d-th largest eigenvalue of K and u d is the corresponding eigenvector. In the case where K is centered in F, R N is the space defined y kernel Principal Component Analysis kpca) [2]. Moreover, as has een shown in [9], [10], the kernel matrix needs not to e centered. In the latter case, N is called the effective dimensionality of F and R N is the corresponding effective suspace of F. he latter case is also essentially the same as the uncentered kernel PCA. Recently, the kernel Entropy Component Analysis ECA) was proposed [4] following the uncentered kernel approach and sorting eigenvectors ased to the size of the entropy values defined as λ d u d 1)2. Kernel ECA has also een shown to e the projection that optimally preserves the length of the data mean vector in F [11]. After sorting the eigen-vectors ased on the size of either the eigenvalues, or the entropy values, the l-th dimension of a samples x j in the kernel suspace is otained y: y j = λ 1 2 l u l k j, 2) where k j R N is a vector having elements [k j ] i = κx i, x j ), i = 1,..., N. Note that the use of such an explicit mapping preserves the discriminative power of the kernel space, since a linear classifier on the data representations in R N can successfully classify all N training samples. When a lower-dimensional suspace R M, M < N of F is sought, the criterion for selecting an eigen-pair u d, λ d ) is defined in a generative manner, i.e. minimizing the quantity K U M Λ M U M 2 2 leading to selecting the eigen-pairs corresponding to the M maximal eigen-values λ 1 λ 2 λ M in the case of kernel PCA, or maximizing the entropy of the data distriution leading to selecting the eigenpairs corresponding to the M maximal entropy values in the case of kernel ECA. III. CLASS MEAN VECOR COMPONEN ANALYSIS Since the data representations in the kernel space F form classes which are linearly separale, we make the assumption that classes in these spaces are unimodal. We express the distance etween classes k and m y: dc k, c m ) = m k m m 2 2, 3) where m c is the mean vector of class c c in F. Since dc k, c m ) is calculated y using elements of the kernel matrix K, i.e. dc k, c m ) = [k k ] k 2[k k ] m + [k m ] m, we exploit the spectral decomposition of K and express the mean vectors in the effective kernel suspace, i.e., m c R N with their d-th dimension equal to: [m c ] d = 1 [φ i ] d = λd u d e c, N c φ i S c 4) where e c R N is the indicator vector for class c having elements equal to [e c ] i = 1/N c for φ i S c, and [e c ] i = 0 otherwise. hen, dc k, c m ) takes the form: dc k, c m ) = N λ d u d e k u ) 2 d e m = N k + N m N k N m N λ d cos 2 u d, e k e m ). 5) From the aove, it can e seen that the eigen-pairs of K maximally contriuting to the distance etween the two class means are those with a high eigenvalue λ d and an eigenvector angularly aligned to the vector e k e m. We express the weighted pair-wise distance etween all C classes in S y: D = = C,C C,C,N k,m, p k p m dc k, c m ) λ d p k p m u d e k u ) 2 N d e m = λ d D d 6) where each class contriutes proportionally to its cardinality p c = N c /N, c = 1,..., C and D d = 1 N 2 C,C N k + N m ) cos 2 u d, e k e m ) 7) expresses the weighted alignment of the eigenvector u d to all possile cominations of class indicator vectors difference. o define the suspace R M of the kernel space F that maximally preserves the pair-wise distances etween the class means in the kernel space, we keep the eigen-pairs minimizing minimizing: D = D D 1:M ) 2. 8)

3 3 hus, in contrary to uncentered) kernel PCA and kernel ECA selecting the eigen-pairs corresponding to the maximal eigenvalues or entropy values, for the selection of an eigenpair in CMVCA oth the eigenvalue λ d needs to e high and the corresponding eigenvector needs to e angularly aligned to the difference of a pair of class indicator vectors. A. CMVCA preserves the class means to total mean distances In the aove we defined CMVCA as the method preserving the pair-wise distances etween class means in F. Considering the weighted distance value of dimension d we have: D d = = 2 C,C p k p m u d e k ) 2 2u d e k u d e m + u d e m) 2) C p k u d e k ) 2 2 C = 2 p k u d e k ) = 2 C,C C,C C C ) ) p k u d e k p mu d e m C m=1 p k p mu d e k u d e m p k p mu d e k u d e m p k u d e k ) 2 2u d e k u d e + u d e) 2) ) C C = 2 p k u d e k u d e) 2 = 2 p k m k m 2 2, 9) where e R N is a vector having elements equal to 1/N. In the aove we considered that C m=1 p m = 1 and that e = C c=1 p ce c. hus, the eigen-pairs selected y minimizing the criterion in 8) are those preserving the distances etween the class means to the total mean in F. B. CMVCA as the Euclidean divergence etween the class proaility density functions Let us assume that the data forming S k and S m are drawn from the proaility density functions p k x) and p m x), respectively. he Euclidean divergence etween these two proaility density functions is given y: Dp k, p m ) = p 2 kx)dx 2 p k x)p m x)dx+ p 2 kx)dx. 10) Given the oservations of these two proaility density functions in S k and S m, Dp k, p m ) can e estimated using the Parzen window method [12], [13]. Let κ σ x i, ) e the Gaussian kernel centered at x i with width σ. hen, we have: ˆDp k, p m) = 1 κ σx i, x j) Nk 2 x i S k x j S k 2 κ σx i, x j) N k N m x i S k x j S m + 1 κ σx i, x j) Nm 2 x j S m x i S m = e k Ke k 2e k Ke m + e mke m 11) or expressing it using the eigen-decomposition of K: ˆDp k, p m ) = N λ d u d e k u ) 2 d e m. 12) Note here that the estimated Euclidean divergence etween p k x) and p m x) gets the same form as the distance of the class mean vectors of classes c k and c m in 5). hus, D in 6) can e expressed as: D = C,C p k p m ˆDpk, p m ). 13) From the aove, it can e seen that the dimensions minimizing the criterion in 8), are those optimally preserving the weighted pair-wise Euclidean divergence etween the proaility density functions of the classes in the input space. Interestingly, exploiting the psd property of the kernel matrix, the analysis in [14] ased on the expected value of kernel convolution operator shows that the Parzen window method can e replaced y any psd kernel function. C. CMVCA in terms of uncentered PCA projections Let us denote y v d the d-th eigenvector of the scatter matrix S φ) = Φ Φ. v d is in essence a projection vector defined y applying uncentered kernel PCA on the input vectors x i, i = 1,..., N. By using the connection etween the eigenvectors of S φ) and K, i.e. u d = 1 λd vd Φ [15], we have: D = 2 = 2 C,N k, p k v d Φe k v d Φe) 2 N C ) ) 2 p k v d m k m) = 2 N ˆD d.14) Using v d 2 2 = 1, we get: C N ) D = 2 p k m k m 2 2 cos 2 v d, m k m). 15) Since N cos2 v d, m k m) = 1, the contriution of uncentered kernel PCA axis v d to m k m 2 2 is determined y the cosine of the angle etween v d and m k m in the sense that the axes which are most angularly aligned with m k m contriute the most. his result adds to the insight provided in [16], [11] and defines CMVCA in terms of the projections otained y applying uncentered kernel PCA on the input data. D. Connection etween CMVCA and KDA o analyze the connection etween CMVCA and Kernel Discriminant Analysis, we define the within-class scatter matrix: S φ) w = C φ i S k φ i m k )φ i m k ) 16) and the etween-class scatter matrix: C S φ) = N k m k m)m k m). 17)

4 4 he total distance is then given y S φ) N S φ) = i=1 = S φ) w + S φ), i.e.: φ i m)φ i m). 18) Using the aove scatter matrices, KDA and its variants [17], [18] the eigenvectors v d maximizing the Rayleigh quotient: ) r V S φ) V V = arg max ), 19) V V=I r V S φ) V leading to at most C 1 axes which are the eigenvectors corresponding to the positive eigenvalues of the generalized eigen-prolem S φ) v = λs φ) v. Here we are interested in the discrimination power in terms of the KDA criterion of the axes v d, d = 1,..., N defined from the spectral decomposition of K. Expressing the aove projections ased on the eigenvectors of S φ) and assuming the data to e centered, i.e., m = 0, the Rayleigh quotient for axis d is equal to: vd S φ) v d vd Sφ) v d = v d = N = N C C N km k m k v d Φ Φ v d ) v d p k 1 λd v d Φe k ) 2 = N ˆD d λ d C ) 2 C p k u d e k = cos 2 u d, e k ). 20) he criterion of CMVCA from 6) and 9) for axis d ecomes: C D d = λ d D d = 2 p k λ d u d e k ) 2 = 2 C λ d cos 2 u d, e k ). N 21) hus, while in CMVCA an eigen-pair contriutes to the Rayleigh quotient ased on oth the size of λ d and the angular alignment etween u d and the class indicator vectors e k, the criterion of KDA selects dimensions ased on only the angular alignment etween u d and the class indicator vectors e k. Note that 20) also gives new insights on why the KDA criterions restricts the dimensionality of the produced suspace y the numer of classes. hat is, since y definition u d form an orthogonal asis, the numer of eigen-vectors that can e angularly aligned to the class indicator vectors is restricted y the numer of classes C, which is equal to the rank of the etween-class scatter matrix for uncentered data. We will exploit this oservation in Section IV to define a discriminative version of CMVCA. E. CMVCA on approximate kernel suspaces In cases where the cardinality of S is prohiitive for applying kernel methods, approximate kernel matrix spectral analysis can e used. Proaly the most widely used approach is ased on the Nyström method, which first chooses a set of n << N reference vectors to calculate the kernel matrix etween the reference vectors K nn R n n and the matrix K Nn R N n containing the kernel function values etween the training and reference vectors. In order to determine the reference vectors, two approaches have een proposed. he first is ased on selecting n columns of K using random or proailistic sampling [19], [20], while the second one determines the reference vectors y applying clustering on the training vectors [21], [22]. he Nyström-ased approximation of K is given y: K K Nn K 1 nnk Nn = K 1 2 nn K Nn) K 1 2 nn K Nn) = Φ Φ n n, 22) where Φ n R n N. When the ranks of K and K nn match, 22) gives an exact calculation of K and Φ n is the same as Φ defined in Section II. Eigen-decomposition of K nn leads to K 1 2 nn = U n Λ 1 2 n U n. When K nn is a n-rank matrix, the matrices Φ Φ n n and Φ n Φ n have the same n leading eigenvalues [15]. he matrix Φ n Φ n is an n n matrix and, thus, applying eigen-analysis to it can highly reduce the computational complexity required for the calculation of eigenvalues Λ n) of the approximate kernel matrix in 22). Finally, x i is represented in the approximate kernel suspace y: φ i = Λ 1 2 n) Λ 1 n U n K Nnk i, 23) After the calculation of Φ n = [ φ 1,..., φ N ], the leading eigenvector u d and the corresponding eigen-values λ d, d = 1,..., n of the approximate kernel matrix correspond to the right singular vectors and the singular values of Φ n. F. CMVCA on randomized kernel suspaces Another approach proposed for making the use of kernel methods in ig data feasile is ased on random nonlinear mappings [23]. A nonlinear mapping x i R D z i R n is defined such that z i z j κx i, x j ), or y using the entire dataset Z Z K. Singular value decomposition of Z can lead to a asis in a suspace of R n. hat is, the right singular vectors of Z corresponding to the non-zero singular values define the axes minimizing D z) = D z) D z) 1:M )2 where D z) is the CMVCA criterion 6) calculated on using Z. IV. CLASS MEAN VECOR DISCRIMINAN ANALYSIS An interesting extension of CMVCA is motivated y the connection of CMVCA with KDA otained y following the analysis in Susection III-D. By comparing 20) and 21) we see that in the case where λ d = 1, d = 1,..., N, the scores calculated for the kernel suspace dimensions y CMVCA and KDA are the same. his situation arises when the data Φ is whitened, i.e. when S φ) = Φ Φ = I, where I R N N is the identity matrix. Interestingly, the information needed for whitening Φ can e directly otained y the eigenanalysis of K = Φ Φ, since there is a direct connection etween the eigenvalues and eigenvectors of S φ) and K. Given a kernel matrix K = Φ Φ, where Φ is whitened, application of CMVCA requires eigen-decomposition of K for calculating the eigen-vectors u d, d = 1,..., N and the corresponding eigenvalues λ d to e used for weighting the dimensions of the kernel suspace ased on 6). K has all its eigenvalues equal to λ d = 1, d = 1,..., N and, thus, its eigenvectors form the axes of an aritrary asis, i.e.: {u d } N, u i u i = 1, u i u j = 0, i j ). 24)

5 5 Fig. 1. Classification rate vs. suspace dimensionality. From top-left to ottom-right: MNIS-100, AR, 15 scene using Nyström approximation, 15 scene using random features, MI indoor using Nyström approximation and MI indoor using random features. Fig. 2. Rayleigh quotient vs. suspace dimensionality on the AR and MI indoor datasets. ABLE I DAASES USED IN EXPERIMENS. Dataset D C N MNIS AR scene MI-indoor Such asis can e efficiently calculated y applying an orthogonalization process e.g. Cholesky decomposition) starting from a vector elonging to the span of Φ. he vector e is such a vector and, thus, can e used for generating the asis. Moreover the vectors 1/ 4 N c )e c, c = 1,..., C elong to the span of Φ and also satisfy the two properties, i.e., 1/ N k )e k e k = 1 and 1/ 4 N k N m )e k e m = 0, k m. hus, they can e selected to form the first C eigenvectors of K. Note that from 21) it can e seen that these vectors contriute the most to the Rayleigh quotient criterion. o form the rest N C ases, we can apply an orthogonalization process on the suspaces determined y each class indicator vector e c, each generating a asis in R Nc appended y zeros for the remaining dimensions, leading to C c=1 N c = N eigenvectors in total. In order to apply CMVDA on the approximate and randomized kernel cases, descried in susections III-E and III-F, the sample representations in the R n are whitened and we follow the aove-descried process to determine the eigen-pairs used for CMVDA-ased projection. V. EXPERIMENS In our experiments we used four datasets, namely the MNIS [24], AR [25], 15 scene [26] and MI indoor [27] datasets. For MNIS dataset, we used the first 100 training samples per class to form the training set and we report performance on the entire test set. For the rest datasets, we perform ten experiments y randomly keeping half of the samples per class for training and the remaining for evaluation, and we report the average performance over the ten experiments. We used the vectorized pixel intensity values for representing images in MNIS and AR datasets. For the 15 scene and MI indoor datasets we used deep features generated y average pooling over spatial dimension of the last convolution layer of VGG network [28] trained on ILSVRC2012 dataase, and we follow the approximate kernel and randomized kernel approaches using n = Details of these datasets are shown in ale I. In all experiments we used the Gaussian kernel function and set the width value equal to the mean pair-wise distance etween all training samples. In order to illustrate the effect of using different suspace dimensionality,

6 6 ABLE II CLASSIFICAION RAES %) OVER HE VARIOUS DAASES. Dataset kpca keca CMVCA KDA CMVDA MNIS AR scene N) MI Indoor N) scene R) MI Indoor R) we used the nearest class centroid classifier for all possile suspaces produced y each of the methods. Figure 1 illustrates the performance otained y applying kernel PCA, kernel ECA, KDA and the proposed CMVCA, CMVDA and the variant of CVMDA-R using the random asis of the kernel matrix produced y whitened kernel effective space Eq. 24)) as a function of the suspace dimensionality. CMVCA performs on par with kpca and keca on MNIS- 100 and AR datasets, while it outperforms them for small suspace dimensionalities in 15 scene and MI Indoor. In 15 scene dataset, CMVCA clearly outperforms kpca and keca, proaly due to the unimodal structure of classes otained y using CNN features. CMVDA and KDA outperform kpca, keca and CMVCA y using a small suspace dimensionality equal to C and C 1, respectively) while the performance otained y applying the CMVDA-R variant gradually increases to match the performance of CMVDA when all the dimensions are used. For completeness, we provide the maximum performance otained y each method in ale II. Figure 2 illustrates the Rayleigh quotient values as a function of the dimensionality of the suspace produced y all methods for the AR and MI indoor datasets. As can e seen, the value of the Rayleigh quotient of the suspaces otained y applying the unsupervised methods are, as expected, low. he suspaces otained y KDA lead to a high value, which is gradually decreasing as more dimensions are added. CMVDA leads to suspaces with a high Rayleigh quotient value which is gradually reduced, similarly to KDA. Similar ehaviors were oserved for the rest of the datasets. VI. CONCLUSION In this paper, we proposed a component analysis method for kernel-ased dimensionality reduction preserving the distances etween the class means in the kernel space. Analysis of the proposed criterion shows that it is also preserves the distances etween the class means and the total mean in the kernel space, as well as the Euclidean divergence of the class proaility density functions in the input space. Moreover, we showed that the proposed criterion, while expressing different properties, has relations to the criteria used in kernel principal component analysis and kernel discriminant analysis. he latter connection leads to a discriminant analysis version of the proposed method. he properties of the proposed approach were illustrated through experiments on real-world data. REFERENCES [1] J. Yang, A. Frangi, J. Yang, D. Zhang, Z. Jin, and Z. Jin, Kpca plus lda: a complete kernel fisher discriminant framework for feature extraction and recognition, IEEE ranactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp , [2] K. Müller, S. Mika, G. Rätsch, K. suda, and B. Schölkopf, An introduction to kernel-ased learning algorithms, IEEE ranactions on Neural networks, vol. 12, no. 2, pp , [3] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE ranactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp , [4] R. Jenssen, Kernel entropy component analysis, IEEE ranactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp , [5] F. Schleif and P. iño, Indefinite proximity learning: A review, Neural Computation, vol. 27, no. 10, pp , [6] A. Gisrecht and F. Schleif, Metric and non-metric proximity transformations at linear costs, Neurocomputing, vol. 167, pp , [7] M. Balcan, A. Blum, and N. Srero, A theory of learning with similarity functions, Machine Learning, vol. 72, no. 1 2, pp , [8] V. Vapnik, Statistical Learning heory. New York: Wiley, [9] N. Kwak, Nonlinear Projection rick in kernel methods: an alternative to the kernel trick, IEEE ransactions on Neural Networks and Learning Systems, vol. 24, no. 12, pp , [10] N. Kwak, Implementing kernel methods incrementally y incremental nonlinear projection trick, IEEE ransactions on Cyernetics, vol. 47, no. 11, pp , [11] R. Jenssen, Mean vector component analysis for visualization and clustering of nonnegative data, IEEE ranactions on Neural networks and Learning Systems, vol. 24, no. 10, pp , [12] J. Principe, Information heoretic Learning: Renyi Entropy and Kernel Perspectives. Springer, [13] C. Williams and M. Seeger, he effect of the input density distriution on kernel ased classifiers, International Conference on Machine Learning, [14] R. Jenssen, Kernel entropy component analysis: New theory and semi-supervised learning, IEEE International Workshop on Machine Learning for Signal Processing, [15] N. Wermuth and H. Riissmann, Eigenanalysis of symmetrizale matrix products: a result with statistical applications, Scandinavian Journal of Statistics, vol. 20, pp , [16] J. Cadima and I. Jolliffe, On relationships etween uncentered and column-centered principal component analysis, Pakistan Journal on Statistics, vol. 25, no. 4, pp , [17] A. Iosifidis, A. efas, and I. Pitas, On the optimal class representation in Linear Discriminant Analysis, IEEE ransactions on Neural Networks and Learning Systems, vol. 24, no. 9, pp , [18] A. Iosifidis, A. efas, and I. Pitas, Kernel reference discriminant analysis, Pattern Recognition Letters, vol. 49, pp , [19] C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems, pp , [20] S. Kumar, M. Mohri, and A. alwalkar, Sampling techniques for the Nyström method, Journal of Machine Learning Research, vol. 13, no. 1. [21] K. Zhang and J. Kwok, Clustered Nyström method for large scale manifold learning and dimensionality reduction, IEEE ransactions on Neural Networks, vol. 21, no. 10, pp , [22] A. Iosifidis and M. Gaouj, Nyström-ased approximate kernel suspace learning, Pattern Recognition, vol. 57, pp , [23] A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems, pp , [24] Y. LeCun, L. Bottou, and Y. Bengio, Gradient-ased learning applied to document recognition, Proceedings of the IEEE, vol. 86, pp , [25] A. Martinez and A. Kak, PCA versus LDA, IEEE ransactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp , [26] S. Lazenik, C. Schmid, and J. Ponce, Beyond ags of features: Spatial pyramid matching for recognizing natural scene categories, IEEE Conference on Computer Vision and Pattern Recognition, [27] A. Quattoni and A. orrala, Recognizing indoor scenes, Computer Vision and Pattern Recognition, CVPR IEEE Conference on, [28] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv: , 2014.

Probabilistic Class-Specific Discriminant Analysis

Probabilistic Class-Specific Discriminant Analysis Probabilistic Class-Specific Discriminant Analysis Alexros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexros.iosifidis@eng.au.dk arxiv:8.05980v [cs.lg] 4 Dec 08 Abstract In this

More information

Nyström-based Approximate Kernel Subspace Learning

Nyström-based Approximate Kernel Subspace Learning Nyström-based Approximate Kernel Subspace Learning Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, P. O. Box 553, FIN-33720 Tampere, Finland e-mail:

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term )

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term ) Probabilistic and Statistical Learning Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term 2016-2019) Tampere, August 2016 Outline Problems considered Bayesian Learning Subspace

More information

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 Robust Face Recognition System using Non Additive

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Efficient Kernel Discriminant Analysis via QR Decomposition

Efficient Kernel Discriminant Analysis via QR Decomposition Efficient Kernel Discriminant Analysis via QR Decomposition Tao Xiong Department of ECE University of Minnesota txiong@ece.umn.edu Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick Nojun Kak, Member, IEEE, Abstract In kernel methods such as kernel PCA and support vector machines, the so called kernel

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

CPM: A Covariance-preserving Projection Method

CPM: A Covariance-preserving Projection Method CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

Two-Dimensional Linear Discriminant Analysis

Two-Dimensional Linear Discriminant Analysis Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu Ravi Janardan Department of CSE University of Minnesota janardan@cs.umn.edu Qi Li Department

More information

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Tao Xiong Department of ECE University of Minnesota, Minneapolis txiong@ece.umn.edu Vladimir Cherkassky Department of ECE University

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING Alexandros Iosifidis and Moncef Gabbouj Department of Signal Processing, Tampere University of Technology, Finland {alexandros.iosifidis,moncef.gabbouj}@tut.fi

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Maximum Within-Cluster Association

Maximum Within-Cluster Association Maximum Within-Cluster Association Yongjin Lee, Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu, Pohang 790-784, Korea Abstract This paper

More information

/16/$ IEEE 2817

/16/$ IEEE 2817 INFORMAION FUSION BASED ON KERNEL ENROPY COMPONEN ANALYSIS IN DISCRIMINAIVE CANONICAL CORRELAION SPACE WIH APPLICAION O AUDIO EMOION RECOGNIION Lei Gao 1,2 Lin Qi 1, Ling Guan 2 1. School of Information

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici Subspace Analysis for Facial Image Recognition: A Comparative Study Yongbin Zhang, Lixin Lang and Onur Hamsici Outline 1. Subspace Analysis: Linear vs Kernel 2. Appearance-based Facial Image Recognition.

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

Minimum Entropy, k-means, Spectral Clustering

Minimum Entropy, k-means, Spectral Clustering Minimum Entropy, k-means, Spectral Clustering Yongjin Lee Biometrics Technology Research Team ETRI 6 Kajong-dong, Yusong-gu Daejon 305-350, Korea Email: solarone@etri.re.kr Seungjin Choi Department of

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Chapter 2 Canonical Correlation Analysis

Chapter 2 Canonical Correlation Analysis Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales,

More information

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama sugi@cs.titech.ac.jp Department of Computer Science, Tokyo Institute of Technology, ---W8-7, O-okayama, Meguro-ku,

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

Kernel Discriminant Analysis for Regression Problems

Kernel Discriminant Analysis for Regression Problems Kernel Discriminant Analysis for Regression Problems 1 Nojun Kwak Abstract In this paper, we propose a nonlinear feature extraction method for regression problems to reduce the dimensionality of the input

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Spectral Clustering. by HU Pili. June 16, 2013

Spectral Clustering. by HU Pili. June 16, 2013 Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework

More information

Myoelectrical signal classification based on S transform and two-directional 2DPCA

Myoelectrical signal classification based on S transform and two-directional 2DPCA Myoelectrical signal classification based on S transform and two-directional 2DPCA Hong-Bo Xie1 * and Hui Liu2 1 ARC Centre of Excellence for Mathematical and Statistical Frontiers Queensland University

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Towards Deep Kernel Machines

Towards Deep Kernel Machines Towards Deep Kernel Machines Julien Mairal Inria, Grenoble Prague, April, 2017 Julien Mairal Towards deep kernel machines 1/51 Part I: Scientific Context Julien Mairal Towards deep kernel machines 2/51

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Notes on Implementation of Component Analysis Techniques

Notes on Implementation of Component Analysis Techniques Notes on Implementation of Component Analysis Techniques Dr. Stefanos Zafeiriou January 205 Computing Principal Component Analysis Assume that we have a matrix of centered data observations X = [x µ,...,

More information

Constructing Optimal Subspaces for Pattern Classification. Avinash Kak Purdue University. November 16, :40pm. An RVL Tutorial Presentation

Constructing Optimal Subspaces for Pattern Classification. Avinash Kak Purdue University. November 16, :40pm. An RVL Tutorial Presentation Constructing Optimal Subspaces for Pattern Classification Avinash Kak Purdue University November 16, 2018 1:40pm An RVL Tutorial Presentation Originally presented in Summer 2008 Minor changes in November

More information

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas GENDER DETERMINATION USING A SUPPORT VECTOR MACHINE VARIANT Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas Artificial Intelligence and Information Analysis Lab/Department of Informatics, Aristotle

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Kernel Sliced Inverse Regression With Applications to Classification

Kernel Sliced Inverse Regression With Applications to Classification May 21-24, 2008 in Durham, NC Kernel Sliced Inverse Regression With Applications to Classification Han-Ming Wu (Hank) Department of Mathematics, Tamkang University Taipei, Taiwan 2008/05/22 http://www.hmwu.idv.tw

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

DIFFUSION NETWORKS, PRODUCT OF EXPERTS, AND FACTOR ANALYSIS. Tim K. Marks & Javier R. Movellan

DIFFUSION NETWORKS, PRODUCT OF EXPERTS, AND FACTOR ANALYSIS. Tim K. Marks & Javier R. Movellan DIFFUSION NETWORKS PRODUCT OF EXPERTS AND FACTOR ANALYSIS Tim K Marks & Javier R Movellan University of California San Diego 9500 Gilman Drive La Jolla CA 92093-0515 ABSTRACT Hinton (in press) recently

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Face recognition Computer Vision Spring 2018, Lecture 21

Face recognition Computer Vision Spring 2018, Lecture 21 Face recognition http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 21 Course announcements Homework 6 has been posted and is due on April 27 th. - Any questions about the homework?

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

Self-Tuning Spectral Clustering

Self-Tuning Spectral Clustering Self-Tuning Spectral Clustering Lihi Zelnik-Manor Pietro Perona Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology California Institute of Technology

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course Spectral Algorithms I Slides based on Spectral Mesh Processing Siggraph 2010 course Why Spectral? A different way to look at functions on a domain Why Spectral? Better representations lead to simpler solutions

More information

arxiv: v1 [cs.cv] 8 Oct 2018

arxiv: v1 [cs.cv] 8 Oct 2018 Deep LDA Hashing Di Hu School of Computer Science Northwestern Polytechnical University Feiping Nie School of Computer Science Northwestern Polytechnical University arxiv:1810.03402v1 [cs.cv] 8 Oct 2018

More information

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Classification on Pairwise Proximity Data

Classification on Pairwise Proximity Data Classification on Pairwise Proximity Data Thore Graepel, Ralf Herbrich, Peter Bollmann-Sdorra y and Klaus Obermayer yy This paper is a submission to the Neural Information Processing Systems 1998. Technical

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information