Two-Dimensional Linear Discriminant Analysis
|
|
- Suzan Russell
- 6 years ago
- Views:
Transcription
1 Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota Ravi Janardan Department of CSE University of Minnesota Qi Li Department of CIS University of Delaware Astract Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has een used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity prolem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity prolem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) efore LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely, which stands for 2-Dimensional Linear Discriminant Analysis. overcomes the singularity prolem implicitly, while achieving efficiency. The key difference etween and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the algorithm works with data in matrix representation. To further reduce the dimension y, the comination of and classical LDA, namely, is studied, where LDA is preceded y. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that and achieve competitive recognition accuracy, while eing much more efficient. Introduction Linear Discriminant Analysis [2, 4] is a well-known scheme for feature extraction and dimension reduction. It has een used widely in many applications such as face recognition [], image retrieval [6], microarray data classification [3], etc. Classical LDA projects the data onto a lower-dimensional vector space such that the ratio of the etween-class distance to the within-class distance is maximized, thus achieving maximum discrimination. The optimal projection (transformation) can e readily computed y applying the eigendecomposition on the scatter matrices. An intrinsic limitation of classical LDA is that its ojective function requires the nonsingularity of one of the scatter matrices. For many applications, such as face recognition, all scatter matrices in question can e singular since the data is from a very high-dimensional space, and in general, the dimension exceeds the
2 numer of data points. This is known as the undersampled or singularity prolem [5]. In recent years, many approaches have een rought to ear on such high-dimensional, undersampled prolems, including pseudo-inverse LDA, PCA+LDA, and regularized LDA. More details can e found in [5]. Among these LDA extensions, PCA+LDA has received a lot of attention, especially for face recognition []. In this two-stage algorithm, an intermediate dimension reduction stage using PCA is applied efore LDA. The common aspect of previous LDA extensions is the computation of eigen-decomposition of certain large matrices, which not only degrades the efficiency ut also makes it hard to scale them to large datasets. In this paper, we present a novel approach to alleviate the expensive computation of the eigen-decomposition in previous LDA extensions. The novelty lies in a different data representation model. Under this model, each datum is represented as a matrix, instead of as a vector, and the collection of data is represented as a collection of matrices, instead of as a single large matrix. This model has een previously used in [8, 9, 7] for the generalization of SVD and PCA. Unlike classical LDA, we consider the projection of the data onto a space which is the tensor product of two vector spaces. We formulate our dimension reduction prolem as an optimization prolem in Section 3. Unlike classical LDA, there is no closed form solution for the optimization prolem; instead, we derive a heuristic, namely. To further reduce the dimension, which is desirale for efficient querying, we consider the comination of and LDA, namely, where the dimension of the space transformed y is further reduced y LDA. We perform experiments on three well-known face datasets to evaluate the effectiveness of and and compare with PCA+LDA, which is used widely in face recognition. Our experiments show that: () is applicale to high-dimensional undersampled data such as face images, i.e., it implicitly avoids the singularity prolem encountered in classical LDA; and (2) and have distinctly lower costs in time and space than PCA+LDA, and achieve classification accuracy that is competitive with PCA+LDA. 2 An overview of LDA In this section, we give a rief overview of classical LDA. Some of the important notations used in the rest of this paper are listed in Tale. Given a data matrix A IR N n, classical LDA aims to find a transformation G IR N l that maps each column a i of A, for i n, in the N-dimensional space to a vector i in the l-dimensional space. That is G : a i IR N i = G T a i IR l (l<n). Equivalently, classical LDA aims to find a vector space G spanned y {g i } l i=, where G =[g,,g l ], such that each a i is projected onto G y (g T a i,,gl T a i) T IR l. Assume that the original data in A is partitioned into k classes as A = {Π,, Π k }, where Π i contains n i data points from the ith class, and k i= n i = n. Classical LDA aims to find the optimal transformation G such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. In general, if each class is tightly grouped, ut well separated from the other classes, the quality of the cluster is considered to e high. In discriminant analysis, two scatter matrices, called within-class (S w ) and etween-class (S ) matrices, are defined to quantify the quality of the cluster, as follows [4]: S w = k i= x Π i (x m i )(x m i ) T, and S = k i= n i(m i m)(m i m) T, where m i = n i x Π i x is the mean of the ith class, and m = k n i= x Π i x is the gloal mean.
3 Notation Description n numer of images in the dataset k numer of classes in the dataset A i ith image in matrix representation a i ith image in vectorized representation r numer of rows in A i c numer of columns in A i N dimension of a i (N = r c) Π j jth class in the dataset L transformation matrix (left) y R transformation matrix (right) y I numer of iterations in B i reduced representation of A i y l numer of rows in B i l 2 numer of columns in B i Tale : Notation It is easy to verify that trace(s w ) measures the closeness of the vectors within the classes, while trace(s ) measures the separation etween classes. In the low-dimensional space resulting from the linear transformation G (or the linear projection onto the vector space G), the within-class and etween-class matrices ecome S L = GT S G, and S L w = G T S w G. An optimal transformation G would maximize trace(s L) and minimize trace(sl w). Common optimizations in classical discriminant analysis include (see [4]): { max trace((s L w ) S L ) } { and min trace((s L ) Sw) L }. () G G The optimization prolems in Eq. () are equivalent to the following generalized eigenvalue prolem: S x = λs w x, for λ 0. The solution can e otained y applying an eigendecomposition to the matrix Sw S,ifS w is nonsingular, or S S w,ifs is nonsingular. There are at most k eigenvectors corresponding to nonzero eigenvalues, since the rank of the matrix S is ounded from aove y k. Therefore, the reduced dimension y classical LDA is at most k. A stale way to compute the eigen-decomposition is to apply SVD on the scatter matrices. Details can e found in [6]. Note that a limitation of classical LDA in many applications involving undersampled data, such as text documents and images, is that at least one of the scatter matrices is required to e nonsingular. Several extensions, including pseudo-inverse LDA, regularized LDA, and PCA+LDA, were proposed in the past to deal with the singularity prolem. Details can e found in [5]. 3 2-Dimensional LDA The key difference etween classical LDA and the that we propose in this paper is in the representation of data. While classical LDA uses the vectorized representation, works with data in matrix representation. We will see later in this section that the matrix representation in leads to an eigendecomposition on matrices with much smaller sizes. More specifically, involves the eigen-decomposition of matrices with sizes r r and c c, which are much smaller than the matrices in classical LDA. This dramatically reduces the time and space complexities of over LDA.
4 Unlike classical LDA, considers the following (l l 2 )-dimensional space L R, which is the tensor product of the following two spaces: L spanned y {u i } l i= and R spanned y {v i } l2 i=. Define two matrices L = [u,,u l ] IR r l and R = [v,,v l2 ] IR c l2. Then the projection of X IR r c onto the space L Ris L T XR R l l2. Let A i IR r c, for i =,,n, e the n images in the dataset, clustered into classes Π,, Π k, where Π i has n i images. Let M i = n i X Π i X e the mean of the ith class, i k, and M = k n i= X Π i X e the gloal mean. In, we consider images as two-dimensional signals and aim to find two transformation matrices L IR r l and R IR c l2 that map each A i IR r c, for i n, to a matrix B i IR l l2 such that B i = L T A i R. Like classical LDA, aims to find the optimal transformations (projections) L and R such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. A natural similarity metric etween matrices is the Froenius norm [8]. Under this metric, the (squared) within-class and etween-class distances D w and D can e computed as follows: k k D w = X M i 2 F, D = n i M i M 2 F. i= X Π i i= Using the property of the trace, that is, trace(mm T )= M 2 F, for any matrix M, we can rewrite D w and D as follows: ( k ) D w = trace (X M i )(X M i ) T, i= X Π i ( k ) D = trace n i (M i M)(M i M) T. i= In the low-dimensional space resulting from the linear transformations L and R, the withinclass and etween-class distances ecome ( k ) D w = trace L T (X M i )RR T (X M i ) T L, i= X Π i ( k ) D = trace n i L T (M i M)RR T (M i M) T L. i= The optimal transformations L and R would maximize D and minimize D w. Due to the difficulty of computing the optimal L and R simultaneously, we derive an iterative algorithm in the following. More specifically, for a fixed R, we can compute the optimal L y solving an optimization prolem similar to the one in Eq. (). With the computed L, we can then update R y solving another optimization prolem as the one in Eq. (). Details are given elow. The procedure is repeated a certain numer of times, as discussed in Section 4. Computation of L For a fixed R, D w and D can e rewritten as D w = trace ( L T S R w L ), D = trace ( L T S R L ),
5 Algorithm (A,,A n, l, l 2 ) Input: A,,A n, l, l 2 Output: L, R, B,,B n. Compute the mean M i of ith class for each i as M i = n i X Π i X; 2. Compute the gloal mean M = k n i= X Π i X; 3. R 0 (I l2, 0) T ; 4. For j fromtoi 5. Sw R k i= X Π i (X M i )R j Rj T (X M i) T, S R k i= n i(m i M)R j Rj T (M i M) T ; 6. Compute the first l eigenvectors {φ L l }l l= of ( ) Sw R S R ; 7. L j [ ] φ L,,φ L l 8. Sw L k i= X Π i (X M i ) T L j L T j (X M i), S L k i= n i(m i M) T L j L T j (M i M); 9. Compute the first l 2 eigenvectors {φ R l }l2 l= of ( Sw) L S L ; 0. R j [ ] φ R,,φ R l 2 ;. EndFor 2. L L I, R R I ; 3. B l L T A l R, for l =,,n; 4. return(l, R, B,,B n ). where S R w = k i= X Π i (X M i )RR T (X M i ) T, S R = k n i (M i M)RR T (M i M) T. Similar to the optimization prolem in Eq. (), the optimal L can e computed y solving the following optimization prolem: max L trace ( (L T S R w L) (L T S R L)). The solution can e otained y solving the following generalized eigenvalue prolem: S R w x = λs R x. Since S R w is in general nonsingular, the optimal L can e otained y computing an eigendecomposition on ( S R w ) S R. Note that the size of the matrices S R w and S R is r r, which is much smaller than the size of the matrices S w and S in classical LDA. Computation of R Next, consider the computation of R, for a fixed L. A key oservation is that D w and D can e rewritten as D w = trace ( R T SwR L ), D = trace ( R T S L R ), where k k Sw L = (X M i ) T LL T (X M i ), S L = n i (M i M) T LL T (M i M). i= X Π i i= This follows from the following property of trace, that is, trace(ab) =trace(ba), for any two matrices A and B. Similarly, the optimal R can e computed y solving the following optimization prolem: max R trace ( (R T SwR) L (R T S LR)). The solution can e otained y solving the following generalized eigenvalue prolem: Swx L = λs Lx. Since SL w is in general nonsingular, i=
6 the optimal R can e otained y computing an eigen-decomposition on ( S L w) S L. Note that the size of the matrices S L w and S L is c c, much smaller than S w and S. The pseudo-code for the algorithm is given in Algorithm. It is clear that the most expensive steps in Algorithm are in Lines 5, 8 and 3 and the total time complexity is O(n max(l,l 2 )(r + c) 2 I), where I is the numer of iterations. The algorithm depends on the initial choice R 0. Our experiments show that choosing R 0 =(I l2, 0) T, where I l2 is the identity matrix, produces excellent results. We use this initial R 0 in all the experiments. Since the numer of rows (r) and the numer of columns (c) of an image A i are generally comparale, i.e., r c N, we set l and l 2 to a common value d in the rest of this paper, for simplicity. However, the algorithm works in the general case. With this simplification, the time complexity of the algorithm ecomes O(ndNI). The space complexity of is O(rc) =O(N). The key to the low space complexity of the algorithm is that the matrices Sw R, S R, SL w, and S L can e formed y reading the matrices A l incrementally. 3. As mentioned in the Introduction, PCA is commonly applied as an intermediate dimensionreduction stage efore LDA to overcome the singularity prolem of classical LDA. In this section, we consider the comination of and LDA, namely, where the dimension y is further reduced y LDA, since small reduced dimension is desirale for efficient querying. More specifically, in the first stage of, each image A i IR r c is reduced to B i IR d d y, with d<min(r, c). In the second stage, each B i is first transformed to a vector i IR d2 (matrix-to-vector alignment), then i is further reduced to L i IR k y LDA with k <d 2, where k is the numer of classes. Here, matrix-to-vector alignment means that the matrix is transformed to a vector y concatenating all its rows together consecutively. The time complexity of the first stage y is O(ndNI). The second stage applies classical LDA to data in d 2 -dimensional space, hence takes O(n(d 2 ) 2 ), assuming n>d 2. Hence the total time complexity of is O ( nd(ni + d 3 ) ). 4 Experiments In this section, we experimentally evaluate the performance of and on face images and compare with PCA+LDA, used widely in face recognition. For PCA+LDA, we use 200 principal components in the PCA stage, as it produces good overall results. All of our experiments are performed on a P4.80GHz Linux machine with GB memory. For all the experiments, the -Nearest-Neighor (NN) algorithm is applied for classification and ten-fold cross validation is used for computing the classification accuracy. Datasets: We use three face datasets in our study: PIX, ORL, and PIE, which are pulicly availale. PIX (availale at contains 300 face images of 30 persons. The image size is We susample the images down to a size of = ORL (availale at contains 400 face images of 40 persons. The image size is PIE is a suset of the CMU PIE face image dataset (availale at 48.html). It contains 665 face images of 63 persons. The image size is We susample the images down to a size of = Note that PIE is much larger than the other two datasets.
7 Numer of iterations Numer of iterations Numer of iterations Figure : Effect of the numer of iterations on and using the three face datasets; PIX, ORL and PIE (from left to right). The impact of the numer, I, of iterations: In this experiment, we study the effect of the numer of iterations (I in Algorithm ) on and. The results are shown in Figure, where the x-axis denotes the numer of iterations, and the y-axis denotes the classification accuracy. d =0is used for oth algorithms. It is clear that oth accuracy curves are stale with respect to the numer of iterations. In general, the accuracy curves of are slightly more stale than those of. The key consequence is that we only need to run the for loop (from Line 4 to Line ) in Algorithm only once, i.e., I =, which significantly reduces the total running time of oth algorithms. The impact of the value of the reduced dimension d: In this experiment, we study the effect of the value of d on and, where the value of d determines the dimensionality in the transformed space y. We did extensive experiments using different values of d on the face image datasets. The results are summarized in Figure 2, where the x-axis denotes the values of d (etween and 5) and the y-axis denotes the classification accuracy with -Nearest-Neighor as the classifier. As shown in Figure 2, the accuracy curves on all datasets stailize around d =4to 6. Comparison on classification accuracy and efficiency: In this experiment, we evaluate the effectiveness of the proposed algorithms in terms of classification accuracy and efficiency and compare with PCA+LDA. The results are summarized in Tale 2. We can oserve that has similar performance as PCA+LDA in classification, while it outperforms. Hence the LDA stage in not only reduces the dimension, ut also increases the accuracy. Another key oservation from Tale 2 is that is almost one order of magnitude faster than PCA+LDA, while, the running time of is close to that of. Hence is a more effective dimension reduction algorithm than PCA+LDA, as it is competitive to PCA+LDA in classification and has the same numer of reduced dimensions in the transformed space, while it has much lower time and space costs. 5 Conclusions An efficient algorithm, namely, is presented for dimension reduction. is an extension of LDA. The key difference etween and LDA is that works on the matrix representation of images directly, while LDA uses a vector representation. has asymptotically minimum memory requirements, and lower time complexity than LDA, which is desirale for large face datasets, while it implicitly avoids the singularity prolem encountered in classical LDA. We also study the comination of and LDA, namely, where the dimension y is further reduced y LDA. Experiments show that and are competitive with PCA+LDA, in terms of classification accuracy, while they have significantly lower time and space costs.
8 Value of d Value of d Value of d Figure 2: Effect of the value of the reduced dimension d on and using the three face datasets; PIX, ORL and PIE (from left to right). Dataset PCA+LDA Time(Sec) Time(Sec) Time(Sec) PIX 98.00% % %.73 ORL 97.75% % % 2.9 PIE 99.32% 53 00% 57 Tale 2: Comparison on classification accuracy and efficiency: means that PCA+LDA is not applicale for PIE, due to its large size. Note that PCA+LDA involves an eigendecomposition of the scatter matrices, which requires the whole data matrix to reside in main memory. Acknowledgment Research of J. Ye and R. Janardan is sponsored, in part, y the Army High Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laoratory cooperative agreement numer DAAD , the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should e inferred. References [] P.N. Belhumeour, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(7):7 720, 997. [2] R.O. Duda, P.E. Hart, and D. Stork. Pattern Classification. Wiley, [3] S. Dudoit, J. Fridlyand, and T. P. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457):77 87, [4] K. Fukunaga. Introduction to Statistical Pattern Classification. Academic Press, San Diego, California, USA, 990. [5] W.J. Krzanowski, P. Jonathan, W.V McCarthy, and M.R. Thomas. Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 44:0 5, 995. [6] Daniel L. Swets and Juyang Weng. Using discriminant eigenfeatures for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(8):83 836, 996. [7] J. Yang, D. Zhang, A.F. Frangi, and J.Y. Yang. Two-dimensional PCA: a new approach to appearance-ased face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26():3 37, [8] J. Ye. Generalized low rank approximations of matrices. In ICML Conference Proceedings, pages , [9] J. Ye, R. Janardan, and Q. Li. GPCA: An efficient dimension reduction scheme for image compression and retrieval. In ACM SIGKDD Conference Proceedings, pages , 2004.
Efficient Kernel Discriminant Analysis via QR Decomposition
Efficient Kernel Discriminant Analysis via QR Decomposition Tao Xiong Department of ECE University of Minnesota txiong@ece.umn.edu Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu
More informationCPM: A Covariance-preserving Projection Method
CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving
More informationDimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014
Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their
More informationWhen Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants
When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg
More informationIntegrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction
Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,
More informationKernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach
Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Tao Xiong Department of ECE University of Minnesota, Minneapolis txiong@ece.umn.edu Vladimir Cherkassky Department of ECE University
More informationAn Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition
An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition Jun Liu, Songcan Chen, Daoqiang Zhang, and Xiaoyang Tan Department of Computer Science & Engineering, Nanjing University
More informationSolving the face recognition problem using QR factorization
Solving the face recognition problem using QR factorization JIANQIANG GAO(a,b) (a) Hohai University College of Computer and Information Engineering Jiangning District, 298, Nanjing P.R. CHINA gaojianqiang82@126.com
More informationSymmetric Two Dimensional Linear Discriminant Analysis (2DLDA)
Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,
More informationLEC 5: Two Dimensional Linear Discriminant Analysis
LEC 5: Two Dimensional Linear Discriminant Analysis Dr. Guangliang Chen March 8, 2016 Outline Last time: LDA/QDA (classification) Today: 2DLDA (dimensionality reduction) HW3b and Midterm Project 3 What
More informationBilinear Discriminant Analysis for Face Recognition
Bilinear Discriminant Analysis for Face Recognition Muriel Visani 1, Christophe Garcia 1 and Jean-Michel Jolion 2 1 France Telecom Division R&D, TECH/IRIS 2 Laoratoire LIRIS, INSA Lyon 4, rue du Clos Courtel
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationLecture: Face Recognition
Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear
More informationWeighted Generalized LDA for Undersampled Problems
Weighted Generalized LDA for Undersampled Problems JING YANG School of Computer Science and echnology Nanjing University of Science and echnology Xiao Lin Wei Street No.2 2194 Nanjing CHINA yangjing8624@163.com
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationLinear Discriminant Analysis Using Rotational Invariant L 1 Norm
Linear Discriminant Analysis Using Rotational Invariant L 1 Norm Xi Li 1a, Weiming Hu 2a, Hanzi Wang 3b, Zhongfei Zhang 4c a National Laboratory of Pattern Recognition, CASIA, Beijing, China b University
More informationExample: Face Detection
Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationA Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe
More informationA Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationChapter 2 Canonical Correlation Analysis
Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales,
More informationA Unified Bayesian Framework for Face Recognition
Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and
More informationRecognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015
Recognition Using Class Specific Linear Projection Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Articles Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection, Peter N. Belhumeur,
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationTwo-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost
Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Peg Howland and Haesun Park Abstract Linear discriminant analysis (LDA) has been used for decades to extract features
More informationDiscriminant Uncorrelated Neighborhood Preserving Projections
Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,
More informationLecture 17: Face Recogni2on
Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis
More informationLecture 13 Visual recognition
Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationImage Analysis & Retrieval. Lec 14. Eigenface and Fisherface
Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationEnhanced Fisher Linear Discriminant Models for Face Recognition
Appears in the 14th International Conference on Pattern Recognition, ICPR 98, Queensland, Australia, August 17-2, 1998 Enhanced isher Linear Discriminant Models for ace Recognition Chengjun Liu and Harry
More informationGroup Sparse Non-negative Matrix Factorization for Multi-Manifold Learning
LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn
More informationAdaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels
Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China
More informationMyoelectrical signal classification based on S transform and two-directional 2DPCA
Myoelectrical signal classification based on S transform and two-directional 2DPCA Hong-Bo Xie1 * and Hui Liu2 1 ARC Centre of Excellence for Mathematical and Statistical Frontiers Queensland University
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationClass Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning
1 Class Mean Vector Component and Discriminant Analysis for Kernel Suspace Learning Alexandros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexandros.iosifidis@eng.au.dk arxiv:1812.05988v1
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationSmall sample size in high dimensional space - minimum distance based classification.
Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc
More informationDivide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai
Divide-and-Conquer Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, 33.4. CSE 6331 Algorithms Steve Lai Divide and Conquer Given an instance x of a prolem, the method works as follows: divide-and-conquer
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationAdaptive Discriminant Analysis by Minimum Squared Errors
Adaptive Discriminant Analysis by Minimum Squared Errors Haesun Park Div. of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, U.S.A. (Joint work with Barry L. Drake and
More informationA Statistical Analysis of Fukunaga Koontz Transform
1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationRobust Tensor Factorization Using R 1 Norm
Robust Tensor Factorization Using R Norm Heng Huang Computer Science and Engineering University of Texas at Arlington heng@uta.edu Chris Ding Computer Science and Engineering University of Texas at Arlington
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationEIGENVALUES AND SINGULAR VALUE DECOMPOSITION
APPENDIX B EIGENVALUES AND SINGULAR VALUE DECOMPOSITION B.1 LINEAR EQUATIONS AND INVERSES Problems of linear estimation can be written in terms of a linear matrix equation whose solution provides the required
More informationA Novel PCA-Based Bayes Classifier and Face Analysis
A Novel PCA-Based Bayes Classifier and Face Analysis Zhong Jin 1,2, Franck Davoine 3,ZhenLou 2, and Jingyu Yang 2 1 Centre de Visió per Computador, Universitat Autònoma de Barcelona, Barcelona, Spain zhong.in@cvc.uab.es
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationSimultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis
Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis Terence Sim Sheng Zhang Jianran Li Yan Chen School of Computing, National University of Singapore, Singapore 117417.
More informationLecture 17: Face Recogni2on
Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationImage Analysis & Retrieval Lec 14 - Eigenface & Fisherface
CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationDiscriminative K-means for Clustering
Discriminative K-means for Clustering Jieping Ye Arizona State University Tempe, AZ 85287 jieping.ye@asu.edu Zheng Zhao Arizona State University Tempe, AZ 85287 zhaozheng@asu.edu Mingrui Wu MPI for Biological
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationLecture 02 Linear Algebra Basics
Introduction to Computational Data Analysis CX4240, 2019 Spring Lecture 02 Linear Algebra Basics Chao Zhang College of Computing Georgia Tech These slides are based on slides from Le Song and Andres Mendez-Vazquez.
More informationNon-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data
Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,
More informationLocal Learning Projections
Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com
More informationComparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.
Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA 99, ashington D. C. USA, March 22-2, 1999. Comparative Assessment of Independent Component
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCOS 429: COMPUTER VISON Face Recognition
COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationDynamical Systems Solutions to Exercises
Dynamical Systems Part 5-6 Dr G Bowtell Dynamical Systems Solutions to Exercises. Figure : Phase diagrams for i, ii and iii respectively. Only fixed point is at the origin since the equations are linear
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationA Tensor Approximation Approach to Dimensionality Reduction
Int J Comput Vis (2008) 76: 217 229 DOI 10.1007/s11263-007-0053-0 A Tensor Approximation Approach to Dimensionality Reduction Hongcheng Wang Narendra Ahua Received: 6 October 2005 / Accepted: 9 March 2007
More informationPCA FACE RECOGNITION
PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationProbabilistic Class-Specific Discriminant Analysis
Probabilistic Class-Specific Discriminant Analysis Alexros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexros.iosifidis@eng.au.dk arxiv:8.05980v [cs.lg] 4 Dec 08 Abstract In this
More informationIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY ASYMMETRIC PRINCIPAL COMPONENT ANALYSIS
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY 2009 931 Short Papers Asymmetric Principal Component and Discriminant Analyses for Pattern Classification Xudong Jiang,
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationFace recognition Computer Vision Spring 2018, Lecture 21
Face recognition http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 21 Course announcements Homework 6 has been posted and is due on April 27 th. - Any questions about the homework?
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images
More informationKeywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and
More informationA Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis
A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis Zhihua Zhang 1, Guang Dai 1, and Michael I. Jordan 2 1 College of Computer Science and Technology Zhejiang University Hangzhou,
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationCS 340 Lec. 6: Linear Dimensionality Reduction
CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More information