CPM: A Covariance-preserving Projection Method

Size: px
Start display at page:

Download "CPM: A Covariance-preserving Projection Method"

Transcription

1 CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving Projection Method (CPM for short is proposed for dimension reduction. CPM maximizes the class discrimination and also preserves approximately the class covariance. The optimization involved in CPM can e formulated as low rank approximations of a collection of matrices, which can e solved iteratively. Our theoretical and empirical analysis reveals the relationship etween CPM and Linear Discriminant Analysis (LDA, Sliced Average Variance Estimator (SAVE, and Heteroscedastic Discriminant Analysis (HDA. This gives us new insights into the nature of these different algorithms. We use oth synthetic and real-world datasets to evaluate the effectiveness of the proposed algorithm. keywords: Dimension reduction, linear discriminant analysis, heteroscedastic discriminant analysis, covariance. Introduction Linear Discriminant Analysis (LDA [4, 7, 8] is a well-known scheme for feature extraction and dimension reduction. LDA projects the data onto a lowerdimensional vector space such that the ratio of the etween-class distance to the within-class distance is maximized, thus achieving maximum discrimination. LDA is equivalent to maximum likelihood classification assuming normal distriution for each class with a common covariance matrix. It has een applied successfully to many areas, such as computer vision, ioinformatics, etc. [, 6,, 5] However, LDA has several limitations: ( It fails to recover the features for classification, when all class centroids coincide; Department of Computer Science and Engineering, Arizona State University Department of Electrical and Computer Engineering, University of Minnesota Department of Computer Science and Engineering, University of Minnesota ( it may not find the est projection, when class covariance matrices vary; and ( the reduced dimension of LDA is no larger than k (k denotes the numer of classes, which may not e sufficient for complex data. Generalization of LDA y fitting Gaussian mixtures to each class has een studied y Hastie []. Cook et al. [, ] proposed the Sliced Average Variance Estimator (SAVE, which is shown to e capale of dealing with the limitations in LDA. Zhu and Hastie [6] developed a general method for finding important discriminant directions without assuming the class densities elong to any particular parametric family. Kumar and Andreou [] proposed HDA, ased on a different model, in which the classes are still Gaussian, yet are allowed to have different covariance matrices, under the condition that oth centroids and covariance matrices coincide in a suspace of the oservation space. In this paper, we propose a new algorithm for dimension reduction, called CPM (which stands for Covariance-preserving Projection Method. CPM aims to maximize the class discrimination and at the same time preserve approximately the class covariance y applying a tuning parameter α etween and. One key feature of CPM is that the critical information on the class covariance is preserved under the projection. With a properly chosen tuning parameter α, whichmayede- pendent on the data distriution, the CPM algorithm is ale to deal with difficult situations encountered in LDA. In practice, the est value of α can e estimated y cross-validation. To illustrate the difference etween CPM and LDA, we generated a synthetic dataset with dimensions and classes as in [6]. Fig (top shows the first two coordinates. For the first two coordinates, class is simulated from a standard multivariate Gaussian, while classes and are mixtures of two symmetrically shifted standard Gaussians. In the remaining 8 coordinates, Gaussian noise with zero mean and standard deviation is used for all three classes. It is clear from Fig (middle that LDA fails to extract important features ecause the class centroids coincide. CPM separates the three classes completely as shown in Fig (ottom, which shows the advantage of incorporating the class 4

2 covariance information in CPM. The optimization (maximization prolem involved in CPM is nonlinear and difficult to solve. We formulate a lower ound for the criterion function used in CPM. The maximization of the derived lower ound can e formulated as low rank approximations of a collection of symmetric and positive semi-definite matrices, a special case of the low rank approximations in [5, 4]. To our est knowledge, there is no closed form solution. We derive an iterative algorithm, which updates the projection successively and converges to a local optimum. Unlike LDA, the reduced dimension, d, of CPM can e larger than k. We also study the relationship etween CPM and SAVE and HDA. SAVE is shown to e closely related to a special case of CPM when α =.5. Recall that the parameter α controls the tradeoff etween the separation of class centroids and the preservation of class covariances. The optimal value of α may depend on the data distriution. CPM is more flexile in dealing with different situations y varying the values of α, thansave,whereα is fixed to e.5. Our theoretical analysis also reveals that CPM is an approximation of HDA. ote that HDA involves complex and nonlinear optimizations []. It has een oserved in [] that HDA has high computational cost, especially for large datasets; and HDA may not complete, since the numerial optimization procedure in HDA may not converge. We show that the optimization prolem involved in CPM is simpler and much easier to solve than HDA. The theoretical analysis further justifies the use of the covariance information in CPM and gives us new insights into the nature of these different algorithms. We perform experiments on oth synthetic and realworld datasets to evaluate the effectiveness of CPM and compare it with other well known algorithms. Our experiments show that: ( CPM is ale to recover the features for classification, even when all class centroids coincide or the covariance matrices vary; ( CPM is competitive with SAVE and LDA in classification; ( CPM is comparale to HDA in classification; and (4 HDA does not complete for several cases. Thus CPM may e a good alternative for HDA as a generalization of LDA to deal with difficult situations where all class centroids coincide or the covariance matrices vary Data LDA CPM Figure : Top: original data (the first two coordinates; Middle: projection y LDA; and Bottom: projection y CPM. ote that the scales for different algorithms vary. 5

3 The rest of the paper is organized as follows: Section presents the CPM algorithm; the relationship etween CPM and SAVE and HDA is discussed in Section ; experiments results are presented in Section 4; and the paper is concluded in Section 5. Class Covariance-preserving Projection Method Given a data matrix A IR n, consisting of data points in n-dimensional space, we wish to find a vector space G spanned y {g i } d,whereg i IR n, such that each data point a i IR n of A, is projected onto G y with d<n.here G T a i =(g T a i,,g T d a i T IR d, G =[g,,g d ] IR n d denotes the projection. Assume that the data in A is partitioned into k classes as A = {Π,, Π k },whereπ i contains data points from the i-th class, and k =. The covariance matrix of the i-th class is defined as W i = (x c i (x c i T, x Π i where c i = x Π i x is the centroid of the i-th class, In discriminant analysis, three scatter matrices, called within-class (S w, etween-class (S, and total (S t matrices are defined as follows [7]: S w = W i, where S = S t = (c i c(c i c T, c = x Π i (x c(x c T, x x Π i is the gloal centroid. It is easy to verify that S t = S w + S. In the low-dimensional space resulting from the linear projection G, the scatter matrices ecome S L = G T S G, S L w = G T S w G, and S L t = G T S t G. The optimal projection G in LDA [7] can e computed y solving the following optimization prolem: (. G { =arg max trace(g T S G }. G:G T S tg=i d Assuming that the total scatter matrix has een normalized, i.e., S t = I n, the optimal projection for LDA can e otained y maximizing trace(g T S G, suject to the orthogonality constraint, that is, G T G = I d.the solution is given y the top eigenvectors of S.Thereare at most k eigenvectors corresponding to the nonzero eigenvalues, since the rank of the matrix S is ounded from aove y k. Therefore, the reduced dimension of LDA is at most k.. Prolem formulation LDA maximizes the separation etween different classes y maximizing trace(g T S G. It is optimal when all classes have a common covariance matrix. However, it ignores the class covariance information and may not e effective when the class centroids are close to each other (ote that S = when class centroids coincide or the class covariance matrices vary. The CPM algorithm proposed elow attempts to overcome the limitations of LDA, y considering the information from oth the class centroids and the class covariances. Let us first consider Principal Component Analysis (PCA [] on the i-th class. Let G e the optimal projection consisting of the first d principal components of W i (the covariance matrix of the i-th class. The information loss, or the approximation error y keep only the d principal components is W i G(G T W i GG T F, where F denotes the Froenius norm of a matrix [9]. The projection in PCA is shown to minimize the approximation error [] and can e computed via the Singular Value Decomposition (SVD [9]. In CPM, we consider all k classes simultaneously and aim to find the projection G such that the weighted approximation error of all classes is minimized. Mathematically, CPM aims to minimize the following weighted approximation error: (. (W i S w G ( G T (W i S w G G T F, whichcaneshowntoeequalto W i S w F GT (W i S w G F, 6

4 assuming G has orthonormal columns. maximizes (. GT (W i S w G F, Thus CPM which is the weighted variance of the k covariance matrices (after the projection G. ote that S w is involved here, since S w = k W i is the weighted sum of {W i } k. Following LDA, CPM considers the class discrimination (separationetween different centroids, y maximizing trace(g T S G. Moreover, CPM attempts to also preserve class covariances. Specifically, CPM considers the information from oth the class centroids and the class covariances simultaneously, via a tuning parameter α etween and. Mathematically, assuming S t has een normalized, CPM computes the optimal G such that (.4 G =arg max h α (G, where α, and h α (G = ( α trace(g T S G +α GT (W i S w G F. The optimization in (.4 is nonlinear and difficult to solve. Instead, we maximize a lower ound, f α (G, of h α (G, where f α (G = ( α G T S G F +α GT (W i S w G F. It is easy to check that G T S G F = trace(g T S GG T S G trace(g T S G, since G has orthonormal columns. Thus, f α (G h α (G. Furthermore, we have the following result: Lemma.. Let S and G e defined aove, then arg max trace(g T S G:G T GGT S G G=I d = arg max trace(g T S G. Proof. Let S = UΣU T e the SVD of S,whereU IR n q has orthonormal columns, Σ=diag(σ,,σ q, σ,, σ q >, and q =rank(s. Since G T G = I d,itcaneshown[9] that q trace(g T S G σ i, and the equality holds when G = U. that ext, we show U =arg max trace(g T S G:G T GGT S G. G=I d The proof follows from: and trace(g T S GG T S G trace(g T S G q σ i, q trace(u T S UU T S U=trace(Σ Σ = σ i. Lemma. will e used in Proposition. elow to show the relationship etween CPM and LDA. The optimization involved in CPM is thus approximated as finding G such that (.5 G =arg max f α (G.. The CPM algorithm Define w = α, M = S,andw i = α i, M i = W i S w, for all i k. Then f α (G = w i G T M i G F (.6 = w i trace ( (G T M i G(G T M i G. To our est knowledge, there is no closed form solution to the aove maximization prolem in (.5 with f α (G as in (.6 [5, 4]. Instead, we derive an iterative algorithm, which computes a sequence of matrices G (j,forj. More specifically, G (j+ is computed so that G (j+ =arg max ( w i trace G T M i G (j (G (j T M i G. 7

5 It can e shown [4] that G (j+ consists of the first d eigenvectors of w i ( M i G (j (G (j T M i. For simplicity, we choose G ( = G = (I d, T. However, empirical results show that CPM is insensitive to the initial choice as in [5, 4]. The main steps of the CPM algorithm include: ( ormalize the data with S t = I n ; ( Initialization: G ( G, j, and s ; ( Compute the first d eigenvectors {φ i } d of k w ( i Mi G (j (G (j T M i ; (4 G (j+ [φ,,φ d ]; (5 s j+ k w i trace ( (G (j+ T M i G (j (G (j T M i G (j+ ; The relationship etween CPM and LDA is descried in Propositions. elow. Proposition.. CPM is equivalent to LDA, if either of the following two conditions hold: ( α = ; or ( all classes have the same covariance matrix, i.e., W i = S w, for all i. Proof. Recall that in CPM, the optimal G is computed y maximizing f α (G, defined as f α (G =( α G T S G F +α GT (W i S w G F, suject to the constraint that G T G = I d. When α =,orw i = S w, for all i, the second term in f α (G vanishes. Thus, arg max f α (G = arg max G T S G:G T G F G=I d = arg max trace(g T S G, (6 Repeat ( (5 until (sj+ sj s j+ η. The convergence of CPM is determined y the threshold η. In our experiment, we choose η = 6. The convergence of the CPM algorithm is estalished in the following theorem: Theorem.. Let {s j } j= e defined aove. Then the sequence {s j } j= is nonincreasing and ounded from aove. Thus the CPM algorithm converges. Proof. By the definition of s j and s j+,wehave s j+ = = max ( w i trace G T M i G (j (G (j T M i G w i trace ((G (j T M i G (j (G (j T M i G (j = s j. w i trace ((G (j T M i G (j (G (j T M i G (j By the property of trace, ( ( (.7 SAVE(α =α T s j+ = w i trace (G (j+ T M i G (j (G (j T M i (G (j+ w i trace (M i M i = w i M i F. The CPM algorithm thus converges. where the last equality follows from Lemma.. This completes the proof of the proposition. Relationship etween CPM and SAVE and HDA In this section, we study the relationship etween CPM, SAVE, and HDA. More specifically, SAVE is shown to e closely related to a special case of CPM, and CPM and SAVE are shown to e approximations of HDA. The theoretical analysis provides the justification for the CPM algorithm and gives us insights into the nature of these different algorithms.. CPM versus SAVE Cook et al. [,]proposed the Sliced Average Variance Estimator (SAVE to overcome the limitations in LDA. Like CPM, each class in SAVE may have different covariances. That is, the covariances, W i and W j for the i-th and j-th classes (i j may e different. In their approach, the data is normalized so that the total covariance matrix is identity (as in CPM and then the discriminant directions are chosen y maximizing ( i (I n W i α, over α IR n of unit length, that is, α =. solutions are given y the top eigenvectors of ( i (I n W i. The 8

6 Since S w + S = S t = I n (after normalization, we have ( i (I n W i = which equals to ( i (S w W i + S, ( i ((Sw W i + S +(S w W i S ( i +S (S w W i = (S w W i + S, (.8 where the last equality follows, since k ( i = and k ( i (Sw W i =. ext, let us take a closer look at the CPM algorithm presented in Section. It was shown empirically in [5] that arg max w i G T M i G F arg max w i trace ( G T Mi G. Thus the solution to CPM can e approximated y maximizing f α (G = ( α trace(g T S G +α trace ( G T (W i S w G, and is given y the first d eigenvectors of ( αs + α (W i S w, which contains the same set of eigenvectors as (.9 (W i S w + S, when α =. ote that the matrices in (.8 and (.9 differ only in the second term. S is used in SAVE, whereas S is used in CPM. Thus, SAVE is related to a special case of CPM when the parameter α is set to e.5. Our experimental results show that when α =.5, CPM often has similar performance as SAVE. Recall that the parameter α controls the tradeoff etween the separation of class centroids and the preservation of class covariances. CPM is more flexile in dealing with different situations y varying the values of α. Experimental results in Section 4 show that CPM is competitive with SAVE and may significantly outperform SAVE for some cases.. CPM versus HDA LDA is optimal, when all classes have a common covariance matrix. However, the assumption is quite strict and may not applicale for some cases, as mentioned in Section. Kumar and Andreou [] proposed Heteroscedastic Discriminant Analysis (HDA, which assumes each class is Gaussian, ut with possily different covariance matrices, under the assumption that oth centroids and covariance matrices coincide in a suspace of the oservation space. More specifically, HDA assumes that there exists an integer d, d < n, a full rank matrix G of dimension n d, and G =[G, F ] IR n n, such that ( G G T c i = T c (. i F T c and (. G T W i G = ( G T W i G F T WF, for some c and W, which are common for all i. This implies that HDA assumes that the differences etween classes lie solely in a suspace of d dimensions (projection y G, whereas in the complementary suspace of n d dimensions (projection y F theyare identical, i.e. useless for discrimination. The proaility density of a i IR n under the aove model is given as det( P (a i = (π G ( n G exp ( GT a i T W g(i G G T T c i ( ( GT W g(i G GT a i G T c i, where a i elongs to the class g(i. The log-likelihood of the data under the linear transformation G and under the constrained Gaussian model aove is logl = logp (a i = { ( GT a i G T ( T c GT i W G g(i ( GT a i G T c i +log ((π n G } T W g(i G (. +logdet( G. 9

7 The aove likelihood function can then e maximized with respect to its parameters. Since there is no closed-form solution for maxmizing the likelihood with respect to its parameters, the maximization has to e performed numerically. More details on this can e found in [], where quadratic programming is performed for the required optimization. As pointed out in [], even though the quadratic-optimization techniques are used, the likelihood surface is not strictly quadratic, and the optimization prolem occasionally fails. We often oserve this prolem in our experiments. In the rest of this section, we show that the CPM algorithm proposed in this paper is an approximation of HDA. The first assumption in (. requires that G T G (c i c =( T (c i c, i.e., most of the information on c i c is preserved in the suspace spanned y G. It follows that G is computed so that GT (c i c =trace(g T S G is large. The second assumption in (. requires that ( G T (W i S w G G = T (W i S w G. Similarly, this implies that GT (W i S w G F is large. These two assumptions can simultaneously e approximated y maximizing h α (G in (.4. Thus, CPM can e considered as an approximation of HDA. Our experimental results in the next section show that CPM is comparale to HDA in classification, while CPM is much more efficient and roust than HDA.. SAVE versus CPM SAVE can also e considered as an approximation of HDA, ased on the following result. Lemma.. Let S e defined aove and let G IR n d with d<n.then max G T G=I d {trace ( G T S G } = max GT G=I d {trace ( G T S G }. Proof. Let S = UΣU T e the SVD of S,whereU IR n q has orthonormal columns, Σ=diag(σ,,σ q, σ,, σ q >, and q =rank(s. From Lemma., q trace(g T S G σ i, and the equality holds when G = U. Similarly, q trace(g T S G=trace(GT UΣ U T G σi, and the maximum value is achieved when G = U. This completes the proof of the lemma. Lemma. shows that the two assumptions of HDA are approximated y maximizing the criterion of SAVE in (.8. While CPM applies the weight α to alance the tradeoff etween these two assumptions, SAVE puts an equal weight to oth terms. 4 Experiments In this section, we evaluate CPM on oth synthetic and real-world datasets. The -earest-eighor algorithm is applied for classification. For simplicity, α =. is used for CPM in all experiments. However, the est value of α may e estimated through cross-validation. The first experiment is on a synthetic dataset, where the class centroids of three classes do not coincide, ut they have different covariance matrices. For the first two coordinates, class is simulated from a standard multivariate Gaussian with mean (, and covariance C = diag(,. Classes and are mixtures of two shifted standard Gaussians, with mean (,, (5, and ( 5,, (, respectively. Each Gaussian component also has covariance C. From Fig., HDA, SAVE, and CPM separate the three classes etter than LDA, which shows the effect of incorporating the class covariance information. LDA does not consider the class covariance information and fails to find the est projection, when class covariances vary. Our next experiment is on the Pendigits dataset from the UCI machine learning repository. It contains the (x, y coordinates of hand-written digits. Each digit is represented as a vector in 6-dimensional space. The dataset is divided into a training set (7494 digits and a ftp://ftp.ics.uci.edu/pu/machine-learning-dataases/

8 test set (498 digits. For the purpose of visualization as in [6], we select the s, 6 s and 9 s. We apply LDA, SAVE, HDA, and CPM to this -class suprolem (which contains 9 training digits and 5 test digits and extract leading discriminant directions. We then project the test set onto these dimensions. The results are shown in Fig.. It is clear that LDA, HDA, and CPM separate the three classes well. SAVE, on the other hand, picks up directions where one class has a much larger variance than the other two classes, as mentioned in [6]. This further confirms our claim in Section that CPM is more flexile in dealing with different situations y choosing different α, thansave, where α is fixed to e.5. ote that we also run CPM with α =.5. The result is similar to SAVE and is thus omitted. ext, we evaluate CPM in terms of classification on two datasets: Pendigits and Ionosphere, oth from the UCI machine learning repository, using different numer, d, of reduced dimensions. For Pendigits, we use a suset consisting of s, 5 s, 6 s, and 9 s. Ionosphere is a inary dataset consisting of 5 instances of radar collected data, with 4 dimensions ( instances for training and 5 instances for test. The result is summarized in Fig. 4, where the x-axis denotes the reduced dimension d for HDA, SAVE, and CPM, and the y-axis denotes the classification accuracy. The result on LDA with the fixed reduced dimension, k, is also presented. We ran HDA on the Ionosphere dataset ut the algorithm did not complete. This may e due to the complex optimization prolem involved in HDA, as mentioned in Section. We oserve from Fig. 4 that the est accuracy for HDA, SAVE, and CPM often occurs when d>k. In practice, the est dimension d can e estimated through cross-validation. Also note that the accuracy curves of HDA and CPM follow similar trend on the Pendigits dataset (see Section for theoretical analysis, while they are quite different from SAVE. Overall, CPM is very competitive with the other three algorithms. Finally we compare CPM with SAVE and LDA in terms of classification on four datasets from UCI. The results are summarized in Tale. The reduced dimension used in CPM is set to e k, the reduced dimension used in LDA. ote that ased on our previous experiments, the performance of CPM may e improved y using larger numer of dimensions. HDA is not included in the comparison, since it is not ale to complete for many of the datasets. We can oserve that CPM outperforms SAVE for all cases, while CPM is very competitive with LDA. We can also oserve that LDA performs reasonaly well for all datasets. This implies that LDA is fairly roust in practice, even though the assumption involved in LDA may e violated. 5 Conclusions A new algorithm, namely CPM is proposed for dimension reduction. It aims to maximize the class discrimination and preserve the class covariance simultaneously. The optimization prolem involved in CPM is equivalent to low rank approximations of a collection of matrices, which can e solved iteratively. We have applied CPM to various datasets and our empirical results show that: ( CPM is capale of recovering the features for classification, even when all class centroids coincide or covariance matrices vary, overcoming limitations of LDA; and ( CPM is competitive with LDA, SAVE, and HDA, in terms of classification. Our theoretical analysis reveals the close relationship etween CPM and LDA, SAVE, and HDA. LDA is shown to e a special case of CPM, which generalizes LDA y utilizing the class covariance information. CPM and SAVE are shown to e approximations of HDA, while they are much more efficient and roust than HDA. SAVE is closely related to a special case of CPM, when the parameter α, which controls the tradeoff etween the separation of class centroids and the preservation of class covariances, is set to e.5. CPM is thus more flexile in dealing with different situations. These theoretical results further justify the criterion used in CPM and give us new insights into these algorithms. Some directions for future work include: ( extension of CPM to deal with high-dimensional data; and ( extension of CPM to deal with nonlinear data using kernels. Acknowledgements Research of JY was sponsored, in part, y the Evolutionary Functional Genomics Center of the Biodesign Institute at the Arizona State University. Support Fellowships from Guidant Corporation and from the Department of Computer Science & Engineering, at the University of Minnesota, is gratefully acknowledged. References [] P.. Belhumeour, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(7:7 7, 997. [] R. D. Cook and S. Weiserg. Discussion of Li (99. Journal of the American Statistical Association, 86: 8, 99. [] R. D. Cook and X. Yin. Dimension reduction and visualization in discriminant analysis (with discussion.

9 LDA HDA SAVE 4 CPM Figure : Projections y LDA, HDA, SAVE and CPM using the synthetic dataset, where the class centroids of three classes do not coincide, ut they have different covariance matrices. HDA, SAVE and CPM consider the class covariance information and separate the three classes etter than LDA.

10 LDA HDA SAVE CPM Figure : Projections y LDA, HDA, SAVE and CPM using the Pendigits dataset. LDA, HDA, and CPM separate the three classes etter than SAVE.

11 Accuracy HDA SAVE CPM LDA Accuracy SAVE CPM LDA Value of d Pendigits Value of d Ionosphere Figure 4: Comparison of classification accuracy of LDA, HDA, SAVE and CPM using the Pendigit and Ionosphere datasets. dataset CPM SAVE LDA Pima 69.55% ( % ( % (.8 Vote 7.% ( % (.5 7.7% (.48 Waveform 8.% (. 79.8% (. 76.6% (.6 Ionosphere 86.% ( % (.8 8.4% (. Tale : Comparison on classification accuracy and standard deviation (in parenthesis. Australian and ew Zealand Journal of Statistics, 4 (:47 99,. [4] R.O. Duda, P.E. Hart, and D. Stork. Pattern Classification. Wiley,. [5] C. Ding and J. Ye. -Dimensional singular value decomposition for D maps and images. In SIAM Data Mining Conference Proceedings, 5. [6] S. Dudoit, J. Fridlyand, and T. P. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457:77 87,. [7] K. Fukunaga. Introduction to Statistical Pattern Classification. 99. [8] T. Hastie, R. Tishirani, and J.H. Friedman. The elements of statistical learning : data mining, inference, and prediction. Springer,. [9] G. H. Golu and C. F. Van Loan. Matrix Computations. Third edition, 996. [] T. Hastie and R. Tishirani. Discriminant analysis y Gaussian mixtures. Journal of the Royal Statistical Society series B, 58:58 76, 996. [] P. Howland, M. Jeon, and H. Park. Structure preserving dimension reduction for clustered text data ased on the generalized singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 5(: 65 79,. [] I. T. Jolliffe. Principal Component Analysis. Springer- Verlag, ew York, 986. []. Kumar and A.G. Andreou. A generalization of linear discriminant analysis in maximum likelihood framework. In Proceedings of Joint Meeting of ASA, 996. [4] J. Ye. Generalized low rank approximations of matrices. In ICML Conference Proceedings, pages , 4. [5] J. Ye. Characterization of a family of algorithms for generalized discriminant analysis on undersampled prolems. Journal of Machine Learning Research, 6: 48 5, 5. [6] M. Zhu and T. Hastie. Feature extraction for nonparametric discriminant analysis. Journal of Computational and Graphical Statistics, (:,. 4

Two-Dimensional Linear Discriminant Analysis

Two-Dimensional Linear Discriminant Analysis Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu Ravi Janardan Department of CSE University of Minnesota janardan@cs.umn.edu Qi Li Department

More information

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Tao Xiong Department of ECE University of Minnesota, Minneapolis txiong@ece.umn.edu Vladimir Cherkassky Department of ECE University

More information

Discriminative K-means for Clustering

Discriminative K-means for Clustering Discriminative K-means for Clustering Jieping Ye Arizona State University Tempe, AZ 85287 jieping.ye@asu.edu Zheng Zhao Arizona State University Tempe, AZ 85287 zhaozheng@asu.edu Mingrui Wu MPI for Biological

More information

Efficient Kernel Discriminant Analysis via QR Decomposition

Efficient Kernel Discriminant Analysis via QR Decomposition Efficient Kernel Discriminant Analysis via QR Decomposition Tao Xiong Department of ECE University of Minnesota txiong@ece.umn.edu Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Chapter 2 Canonical Correlation Analysis

Chapter 2 Canonical Correlation Analysis Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales,

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Weighted Generalized LDA for Undersampled Problems

Weighted Generalized LDA for Undersampled Problems Weighted Generalized LDA for Undersampled Problems JING YANG School of Computer Science and echnology Nanjing University of Science and echnology Xiao Lin Wei Street No.2 2194 Nanjing CHINA yangjing8624@163.com

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning

Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning 1 Class Mean Vector Component and Discriminant Analysis for Kernel Suspace Learning Alexandros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexandros.iosifidis@eng.au.dk arxiv:1812.05988v1

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Bilinear Discriminant Analysis for Face Recognition

Bilinear Discriminant Analysis for Face Recognition Bilinear Discriminant Analysis for Face Recognition Muriel Visani 1, Christophe Garcia 1 and Jean-Michel Jolion 2 1 France Telecom Division R&D, TECH/IRIS 2 Laoratoire LIRIS, INSA Lyon 4, rue du Clos Courtel

More information

Solving the face recognition problem using QR factorization

Solving the face recognition problem using QR factorization Solving the face recognition problem using QR factorization JIANQIANG GAO(a,b) (a) Hohai University College of Computer and Information Engineering Jiangning District, 298, Nanjing P.R. CHINA gaojianqiang82@126.com

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition Jun Liu, Songcan Chen, Daoqiang Zhang, and Xiaoyang Tan Department of Computer Science & Engineering, Nanjing University

More information

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Peg Howland and Haesun Park Abstract Linear discriminant analysis (LDA) has been used for decades to extract features

More information

Kernel quadratic discriminant analysis for small sample size problem

Kernel quadratic discriminant analysis for small sample size problem Pattern Recognition ( www.elsevier.com/locate/pr Kernel quadratic discriminant analysis for small sample size prolem Jie Wang a,, K.. Plataniotis a, Juwei Lu, A.. Venetsanopoulos c a Department of Electrical

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA

Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA JIEPING YE and JIANHUI CHEN Arizona State University RAVI JANARDAN University of Minnesota

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Probabilistic Fisher Discriminant Analysis

Probabilistic Fisher Discriminant Analysis Probabilistic Fisher Discriminant Analysis Charles Bouveyron 1 and Camille Brunet 2 1- University Paris 1 Panthéon-Sorbonne Laboratoire SAMM, EA 4543 90 rue de Tolbiac 75013 PARIS - FRANCE 2- University

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Adaptive Discriminant Analysis by Minimum Squared Errors

Adaptive Discriminant Analysis by Minimum Squared Errors Adaptive Discriminant Analysis by Minimum Squared Errors Haesun Park Div. of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, U.S.A. (Joint work with Barry L. Drake and

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /

More information

A Unified Bayesian Framework for Face Recognition

A Unified Bayesian Framework for Face Recognition Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables Minimizing a convex separale exponential function suect to linear equality constraint and ounded variales Stefan M. Stefanov Department of Mathematics Neofit Rilski South-Western University 2700 Blagoevgrad

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA Multiclass Classifiers Based on Dimension Reduction with Generalized LDA Hyunsoo Kim a Barry L Drake a Haesun Park a a College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Abstract

More information

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

COS 429: COMPUTER VISON Face Recognition

COS 429: COMPUTER VISON Face Recognition COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:

More information

Lecture 17: Face Recogni2on

Lecture 17: Face Recogni2on Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Template-based Recognition of Static Sitting Postures

Template-based Recognition of Static Sitting Postures Template-based Recognition of Static Sitting Postures Manli Zhu 1, Aleix M. Martínez 1 and Hong Z. Tan 2 1 Dept. of Electrical Engineering The Ohio State University 2 School of Electrical and Computer

More information

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS

COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS DANIEL L. ELLIOTT CHARLES W. ANDERSON Department of Computer Science Colorado State University Fort Collins, Colorado, USA MICHAEL KIRBY

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information