COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS
|
|
- Rosanna Stafford
- 5 years ago
- Views:
Transcription
1 COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS DANIEL L. ELLIOTT CHARLES W. ANDERSON Department of Computer Science Colorado State University Fort Collins, Colorado, USA MICHAEL KIRBY Department of Mathematics Colorado State University Fort Collins, Colorado, USA ABSTRACT This paper studies the effect of covariance regularization for classification of high-dimensional data. This is done by fitting a mixture of Gaussians with a regularized covariance matrix to each class. Three data sets are chosen to suggest the results are applicable to any domain with high-dimensional data. The regularization needs of the data when pre-processed using the dimensionality reduction techniques principal component analysis (PCA) and random projection are also compared. Observations include that using a large amount of covariance regularization consistently provides classification accuracy as good if not better than using little or no covariance regularization. The results also indicate that random projection complements covariance regularization. INTRODUCTION When classifying high-dimensional data, the mixture of Gaussians (MoG) model has been largely neglected in the literature in favor of estimates to a mixture of Gaussians. Another common solution is to reduce the dimension of the data prior to learning. PCA is the most popular method in these situations but random projection is gaining attention (Candes & Tao 26). Covariance regularization remains an active research topic (Robinson 29). This paper applies a mixture of Gaussians model learned via expectation-maximization (EM) with shrinkage covariance regularization to three data sets from different domains. The effect of covariance regularization is examined in conjunction with both of these dimensionality reduction techniques. Section 2 begins by presenting several popular algorithms as MoG with covariance regularization. After presenting the experimental methodology in Section 3 and experimental results in Section 4, conclusions are summarized in Section 5. 2 BACKGROUND A summary of MoG with covariance regularization is presented in this Section. Also, several long-standing algorithms are presented as MoG with covariance regularization. dane@cs.colostate.edu
2 2. COVARIANCE REGULARIZATION Covariance regularization is simply a method for reducing the number of free parameters in a model. The effect of having too little data to provide an accurate estimate of the parameters true values is over-fitting which results in poor generalization to unseen data. When using a MoG, this can also cause the learned covariance matrix to be singular. When a mixture of Gaussians is applied to highdimensional data, it is commonly constrained (e. g. use only the onal of the covariance matrix) to ensure non-singularity, for example (Hammoud & Mohr 2). Many variations on the mixture of Gaussians model have been proposed to tackle these problems. Moghaddam and Pentland proposed the use of a mixture of Gaussians on PCA-projected data for object detection and tested it on human faces, human hands, and facial features such as eyes, nose, and mouth (Moghaddam & Pentland 997). They show how to use principal component analysis to estimate a Gaussian distribution. Hinton et al. applied a mixture of factor analyzers (MFA) to estimate a MoG for a handwritten digit classification problem (Hinton et al. 997). They proposed that factor analysis, performed locally on each cluster, could model the manifold formed through the application of small image transformations to each class prototype image. Each digit class may have multiple, separated, continuous manifolds; therefore each class is represented using a mixture of factor analyzers. Tipping and Bishop brought principal component analysis to the world of mixture of Gaussians through their mixture of probabilistic principal component analyzers (MPPCA) (Tipping & Bishop 999). MPPCA is based upon MFA but FA is replaced by PCA performed locally for each cluster. Researchers have long focused on linear discriminant analysis () for classification problems. and quadratic discriminant analysis (QDA) have roots in a mixture of Gaussians. assumes that each class is modeled using a single Gaussian constrained to share a covariance matrix with the other classes. QDA allows each class to be modeled using a single, unique Gaussian resulting in a non-linear decision boundary (Bishop 27). The estimate can be regularized further like done in PCA+ (Belhumeur et al. 997). In recent years several variations on this theme have appeared where the PCA null space (eigenvectors associated with zero eigenvalues) is exploited or completely ignored for its lack of discriminatory information. Regularized discriminant analysis (RDA) is one such technique (Ye & Wang 26). The transformation methods such as MFA and MPPCA are presented, not as covariance regularization methods, but as manifold and subspace learning methods. These algorithms allow for simultaneous classification and dimensionality reduction which was an improvement upon previous efforts which created subspaces prior to or after clustering. The benefit of these algorithms was considered to be their invariance to small image transformations and their relationship to covariance regularization has been largely ignored. The data itself may also be modified to require less covariance to achieve a good fit. PCA is a popular method for data whitening. Random projection has also been shown to whiten the data (Dasgupta 2, Deegalla & Bostrom 26). 2.2 MIXTURE OF GAUSSIANS WITH COVARIANCE REGULARIZATION MoG is historically the most used and researched mixture model. Its strength comes from its ability to approximate any density function. Shrinkage is the covariance regularization approach applied in these experiments. Shrinkage has the benefit of being simple and efficient to calculate. Shrinkage estimates, Σ, usually involve a parameter, λ, that balances between the empirical covariance matrix, 2
3 Σ, and some target matrix, T : Σ = λt + ( λ)σ () Two target matrices from (Schäfer & Strimmer 25) are considered in this paper, computed using (2) or (3). T ij = T ij = j if i = j if i j j σii if i = j if i j (2) (3) When λ =, (2) is equivalent to fuzzy kmeans (Mitchell 997) while (3) is equivalent to a MoG with only variance. These two versions will be referred to as FKM and DIAG respectively. 3 METHODOLOGY In these experiments, a MoG is fit to each class using EM (Bishop 27) with the additional shrinkage step, (). For implementation details, see (Elliott 29). Although a mixture of Gaussians is fit to each class in an unsupervised way, supervised learning is still performed by fitting a MoG to each class and then computing the Bayes optimal classification (Mitchell 997): argmax P(c)p(x Θ c). (4) c C Here, P(c) is simply the fraction of the training data that belong to class c and p(x Θ c) is the probability of a data point given the MoG model for class c. 3. DATA SETS Three data sets are used for experimentation: mfeat, isolet, and a set of appearance-based data. The appearance-based image data set consists of three different classes collected from the Internet: cat/dog (2 images), Christmas tree ( images), and sunsets (93 images) (Elliott 29). The cat/dog images have been manipulated to include only the face of the animal and are hand registered using the eyes. The Christmas tree images were chosen to have a tree in the middle and the sunset images were chosen to have a bright middle region and a dark lower region. The multiple features (mfeat) data set (Frank & Asuncion 2) consists of 649 features extracted from 2 samples of handwritten digits. The isolet data set (Frank & Asuncion 2) consists of 67 features are extracted extracted from 7797 samples of 5 subjects speaking each letter of the alphabet twice. 3.2 DESCRIPTION OF EXPERIMENTS and logistic regression (LogReg) (Bishop 27) results are included with each experiment for comparison. Supervised MoG with shrinkage has two experimental parameters: C {, 2, 3}, the number of clusters per class, and λ [, ], the shrinkage parameter. The best combination of experimental parameter values are chosen using cross-validation. Algorithm comparison is summarized by the classification accuracy averaged over a number of random partitions of the data. LogReg and have no experimental parameters. Random projection matrices are obtained via the QR decomposition of a random matrix. The PCA subspace is created from the training data, X. 3
4 4 RESULTS Figure shows the classification accuracy on the appearance-based image data. These algorithms performed similarly on the validation data and the testing data indicating that cross-validation is a good method for choosing parameters. FKM, DIAG, and LogReg perform similarly until the dimension hits 2, at which point LogReg s performance begins to decline while DIAG and FKM enjoy their best performance. This disparity may be a result of the data being multi-modal when in higher dimension. Test data acc Test data acc (a) PCA projected data (b) Random projected data Figure : Classification accuracy on appearance-based image data for both projection methods. As the dimension of the data increases, the performance of DIAG decreases to a level below that of LogReg on the PCA-projected data. DIAG is much more competitive when the data is pre-processed using random projection beating LogReg in all but the smallest dimensions. DIAG s performance drop-off for the PCA-projected data occurs where the number of eigenvectors first exceeds the number of training samples. FKM performs consistently at or near the top when preprocessed using either projection method and, along with LogReg, has nearly identical classification accuracies for both projection methods. Unlike with the randomly-projected data, is able to occasionally run without becoming numerically unstable on the PCA-projected data because the PCA subspace has just enough variance in these dimensions that the covariance matrix used by is nonsingular. Figure 3 shows the results for mfeat and isolet experiments. The difference in performance between LogReg, DIAG, and FKM are much less pronounced with these two data sets for both projection methods. This is most likely a result of the two data sets being much larger and possibly uni-modal (which assists LogReg and ). However, the graphs of the three data sets results have similar features. Random projection show a much more dramatic reduction in performance once too many dimensions are dropped compared to PCA. This is because the PCA subspace dimensions are sorted by how much variance they capture while there is nothing special about the first random dimensions. Also, DIAG s performance eventually drops as an increasing number of noisy PCA subspace dimensions are retained while random projection shows consistent performance across dimensions. In addition to dimension and projection methods, X is also modified for these two data sets but the dominance of FKM over DIAG and LogReg seen with the appearancebased image data is not replicated. In fact, LogReg, due to its much lower number of parameters, is unaffected by lower X. Figure 5, which displays the results for DIAG and FKM for both projection methods 4
5 FKM Test Data Accuracy By Lambda Value DIAG Test Data Accuracy By Lambda Value (a) PCA with FKM (b) PCA with DIAG FKM Test Data Accuracy By Lambda Value DIAG Test Data Accuracy By Lambda Value (c) Random with FKM (d) Random with DIAG Figure 2: Classification accuracy with respect to dimension for both DIAG and FKM versions with several λ values on test data for both projection methods on the appearance-based image data. for various λ values across dimension and X on the mfeat and isolet data sets, shows a similar relationship between λ and classification accuracy where higher λ values are best up to around or.9 and then quickly drops to such an extent that λ of and above are among the worst performing values. Figure 2 shows how the classification accuracies for the two MoG versions are affected by choice of λ for both projection schemes on the appearance-based data. Figures 2a and 2c show a preference toward higher λ values for FKM with the notable exception at λ =. Figure 2b shows a less clear relationship for λ with DIAG on the PCA-projected data. Figure 2c shows the performance of FKM with random projection with λ = rising as the dimensions increasing as the dimension increases and there is no reason to believe that it will not eventually become even with the other λ values. Figure 2d reveals no apparent favorite value for λ by DIAG with random projection while λ = is inconsistent and less desirable than the other values on average. 5 CONCLUSIONS By a very small margin, FKM with λ = and random projection keeping many dimensions (little to no dimensionality reduction) gives the best perfor- 5
6 PCA Proj Test Data RND Proj Test Data PCA Proj Test Data RND Proj Test Data (a) mfeat/pca (b) mfeat/rnd (c) isolet/pca (d) isolet/rnd Figure 3: Classification accuracy on mfeat and isolet data sets for both PCA and RND projection methods. (a) 7% (b) 5% (c) 3% (d) 6% Figure 4: Classification accuracy of various λ values with DIAG on mfeat data set for the PCA projection method. The number beneath each plot is the percentage of the 2 data points used for training. mance. The difference in performance for DIAG as dimension increases between the two projection methods is most likely a result of the later PCA dimensions being uninformative noise. Remember, DIAG is unable to remove any variance and, therefore, noisy dimensions have a more negative effect than for its FKM counterpart. This drop in performance is also present in the training data. At the lower dimensions, random projection pre-processing performs worse than PCA. However, once dimensionality reaches a certain level, the performance of FKM and LogReg reamain steady for both projection methods. Figure 4 shows an odd relationship between the number of PCA dimensions kept and classification accuracy of DIAG as X decreases. As expected, the dimension at which DIAG performance drops off for all λ values decreases along with X. However, as X shrinks, the magnitude of this dip decreases as well. This may be explained by a decreasing X decreasing the specificity of the PCA subspace to the training data. The dropped PCA dimensions for large X are almost pure noise. The PCA subspace dimensions computed form small X will be more general. Therefore, keeping all PCA dims computed from a 6
7 PCA Proj Test Data DIAG PCA Proj Test Data FKM RND Proj Test Data DIAG RND Proj Test Data FKM (a) DIAG/PCA/mfeat (b) FKM/PCA/mfeat (c) DIAG/RND/mfeat (d) FKM/RND/mfeat PCA Proj Test Data DIAG PCA Proj Test Data FKM RND Proj Test Data DIAG RND Proj Test Data FKM (e) DIAG/PCA/isolet (f) FKM/PCA/isolet (g) DIAG/RND/isolet (h) FKM/RND/isolet Figure 5: Classification accuracy of various λ values for both covariance regularization methods, projection methods, and the mfeat and isolet data sets. small X will result in better performance than for large X once D X in part because these later dimensions are now more like random projection. Most importantly, the result that FKM with high λ is consistently at the top in these experiments. This tells us that some covariance information is necessary but a little bit of covariance goes a long way toward being able to classify this data and generalize to unseen data while adding additional covariance primarily increases the risk of over-fitting with little chance of improving classification accuracy. In addition to the promise of projection onto high dimensional, random basis combined with FKM with > λ our results show a simpler relationship between choice of λ and the number of dimensions to keep when projecting using using a random projection. By comparison PCA pre-processing, still the most popular dimensionality reduction technique in many domains, is fussy and its optimal number of dimensions is different for each data set. If a near-optimal number of subspace dimensions is not chosen, there is a great deal of variation in performance between λ values for DIAG. Another option is to apply FKM when pre-processing via PCA projection. However, computation of a 7
8 random basis is much faster than PCA computation. Either way, it appears that it is important to involve a large degree of covariance regularization through either random projection or FKM with high λ values. For all data sets, as the dimension of the randomly generated basis increases, the disparity in performance between the varying levels of covariance regularization decreases. If this trend were to continue, one could expect FKM with λ = to become competitive with all λ values for FKM and DIAG and kmeans could replace MoG. In this case, projecting data into a higher dimension and using kmeans would yield an algorithm with superior accuracy and improved computational complexity. Investigating this further is left to future work. These observations span the three data sets used in this paper which represent use of raw data and features computed from the data and situations where there is sufficient and insuffient training data. Isolet and mfeat may be uni-modal and are a good fit for LogReg when the training set size is diminished. Otherwise the experimental results indicate random projection with little dimensionality reduction and application of FKM with high λ is a safe bet to obtain quality classfication accuracy. References Belhumeur, P., Hespanha, J. & Kriegman, D. (997), Eigenfaces vs. fisherfaces: recognition using class specific linear projection, Pattern Analysis and Machine Intelligence, IEEE Transactions on 9(7), Bishop, C. M. (27), Pattern Recognition and Machine Learning, Springer. Candes, E. & Tao, T. (26), Near-optimal signal recovery from random projections: Universal encoding strategies, IEEE Trans. Inform. Theory (52), Dasgupta, S. (2), Experiments with random projections, in Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence. Deegalla, S. & Bostrom, H. (26), Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification, in ICMLA 6, IEEE Computer Society, Washington, DC, USA, pp Elliott, D. L. (29), Covariance regularization in mixture of gaussians for high-dimensional image classification, Master s thesis, Colorado State University. Frank, A. & Asuncion, A. (2), UCI machine learning repository, Hammoud, R. & Mohr, R. (2), Mixture densities for video objects recognition, International Conference on Pattern Recognition 2, 27. Hinton, G., Dayan, P. & Revow, M. (997), Modelling the manifolds of images of handwritten digits, IEEE transactions on Neural Networks 8(), Mitchell, T. M. (997), Machine Learning, McGraw-Hill, Boston. Moghaddam, B. & Pentland, A. (997), Probabilistic visual learning for object representation, PAMI 9(7), Robinson, J. A. (29), Covariance estimation in full- and reduced-dimensionality image classification, Image Vision Comput. 27(8), Schäfer, J. & Strimmer, K. (25), A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Statistical Applications in Genetics and Molecular Biology 4(). Tipping, M. & Bishop, C. (999), Mixtures of probabilistic principal component analysers, Neural Computation (2), Ye, J. & Wang, T. (26), Regularized discriminant analysis for high dimensional, low sample size data, in KDD 26, ACM, New York, NY, USA, pp
THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott
THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION Submitted by Daniel L Elliott Department of Computer Science In partial fulfillment of the requirements
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationDimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014
Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their
More informationA Unified Bayesian Framework for Face Recognition
Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and
More informationEM in High-Dimensional Spaces
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 35, NO. 3, JUNE 2005 571 EM in High-Dimensional Spaces Bruce A. Draper, Member, IEEE, Daniel L. Elliott, Jeremy Hayes, and Kyungim
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationLecture: Face Recognition
Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear
More informationKeywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationExample: Face Detection
Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationRecognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015
Recognition Using Class Specific Linear Projection Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Articles Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection, Peter N. Belhumeur,
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationComparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.
Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA 99, ashington D. C. USA, March 22-2, 1999. Comparative Assessment of Independent Component
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationWhen Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants
When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDiscriminant Uncorrelated Neighborhood Preserving Projections
Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationRobot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction
Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationFace Detection and Recognition
Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationRandom Sampling LDA for Face Recognition
Random Sampling LDA for Face Recognition Xiaogang Wang and Xiaoou ang Department of Information Engineering he Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk Abstract Linear Discriminant
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationEnhanced Fisher Linear Discriminant Models for Face Recognition
Appears in the 14th International Conference on Pattern Recognition, ICPR 98, Queensland, Australia, August 17-2, 1998 Enhanced isher Linear Discriminant Models for ace Recognition Chengjun Liu and Harry
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationLocal Learning Projections
Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com
More informationGroup Sparse Non-negative Matrix Factorization for Multi-Manifold Learning
LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationLecture 17: Face Recogni2on
Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationSymmetric Two Dimensional Linear Discriminant Analysis (2DLDA)
Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,
More informationSupervised locally linear embedding
Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationOn Improving the k-means Algorithm to Classify Unclassified Patterns
On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationW vs. QCD Jet Tagging at the Large Hadron Collider
W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationLecture 13 Visual recognition
Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object
More informationIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY ASYMMETRIC PRINCIPAL COMPONENT ANALYSIS
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY 2009 931 Short Papers Asymmetric Principal Component and Discriminant Analyses for Pattern Classification Xudong Jiang,
More informationLecture 17: Face Recogni2on
Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationLearning features by contrasting natural images with noise
Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationA Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationCOS 429: COMPUTER VISON Face Recognition
COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:
More informationFace Recognition Using Multi-viewpoint Patterns for Robot Vision
11th International Symposium of Robotics Research (ISRR2003), pp.192-201, 2003 Face Recognition Using Multi-viewpoint Patterns for Robot Vision Kazuhiro Fukui and Osamu Yamaguchi Corporate Research and
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationSubspace Methods for Visual Learning and Recognition
This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University
More informationA Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier
A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More information