COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS

Size: px
Start display at page:

Download "COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS"

Transcription

1 COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS DANIEL L. ELLIOTT CHARLES W. ANDERSON Department of Computer Science Colorado State University Fort Collins, Colorado, USA MICHAEL KIRBY Department of Mathematics Colorado State University Fort Collins, Colorado, USA ABSTRACT This paper studies the effect of covariance regularization for classification of high-dimensional data. This is done by fitting a mixture of Gaussians with a regularized covariance matrix to each class. Three data sets are chosen to suggest the results are applicable to any domain with high-dimensional data. The regularization needs of the data when pre-processed using the dimensionality reduction techniques principal component analysis (PCA) and random projection are also compared. Observations include that using a large amount of covariance regularization consistently provides classification accuracy as good if not better than using little or no covariance regularization. The results also indicate that random projection complements covariance regularization. INTRODUCTION When classifying high-dimensional data, the mixture of Gaussians (MoG) model has been largely neglected in the literature in favor of estimates to a mixture of Gaussians. Another common solution is to reduce the dimension of the data prior to learning. PCA is the most popular method in these situations but random projection is gaining attention (Candes & Tao 26). Covariance regularization remains an active research topic (Robinson 29). This paper applies a mixture of Gaussians model learned via expectation-maximization (EM) with shrinkage covariance regularization to three data sets from different domains. The effect of covariance regularization is examined in conjunction with both of these dimensionality reduction techniques. Section 2 begins by presenting several popular algorithms as MoG with covariance regularization. After presenting the experimental methodology in Section 3 and experimental results in Section 4, conclusions are summarized in Section 5. 2 BACKGROUND A summary of MoG with covariance regularization is presented in this Section. Also, several long-standing algorithms are presented as MoG with covariance regularization. dane@cs.colostate.edu

2 2. COVARIANCE REGULARIZATION Covariance regularization is simply a method for reducing the number of free parameters in a model. The effect of having too little data to provide an accurate estimate of the parameters true values is over-fitting which results in poor generalization to unseen data. When using a MoG, this can also cause the learned covariance matrix to be singular. When a mixture of Gaussians is applied to highdimensional data, it is commonly constrained (e. g. use only the onal of the covariance matrix) to ensure non-singularity, for example (Hammoud & Mohr 2). Many variations on the mixture of Gaussians model have been proposed to tackle these problems. Moghaddam and Pentland proposed the use of a mixture of Gaussians on PCA-projected data for object detection and tested it on human faces, human hands, and facial features such as eyes, nose, and mouth (Moghaddam & Pentland 997). They show how to use principal component analysis to estimate a Gaussian distribution. Hinton et al. applied a mixture of factor analyzers (MFA) to estimate a MoG for a handwritten digit classification problem (Hinton et al. 997). They proposed that factor analysis, performed locally on each cluster, could model the manifold formed through the application of small image transformations to each class prototype image. Each digit class may have multiple, separated, continuous manifolds; therefore each class is represented using a mixture of factor analyzers. Tipping and Bishop brought principal component analysis to the world of mixture of Gaussians through their mixture of probabilistic principal component analyzers (MPPCA) (Tipping & Bishop 999). MPPCA is based upon MFA but FA is replaced by PCA performed locally for each cluster. Researchers have long focused on linear discriminant analysis () for classification problems. and quadratic discriminant analysis (QDA) have roots in a mixture of Gaussians. assumes that each class is modeled using a single Gaussian constrained to share a covariance matrix with the other classes. QDA allows each class to be modeled using a single, unique Gaussian resulting in a non-linear decision boundary (Bishop 27). The estimate can be regularized further like done in PCA+ (Belhumeur et al. 997). In recent years several variations on this theme have appeared where the PCA null space (eigenvectors associated with zero eigenvalues) is exploited or completely ignored for its lack of discriminatory information. Regularized discriminant analysis (RDA) is one such technique (Ye & Wang 26). The transformation methods such as MFA and MPPCA are presented, not as covariance regularization methods, but as manifold and subspace learning methods. These algorithms allow for simultaneous classification and dimensionality reduction which was an improvement upon previous efforts which created subspaces prior to or after clustering. The benefit of these algorithms was considered to be their invariance to small image transformations and their relationship to covariance regularization has been largely ignored. The data itself may also be modified to require less covariance to achieve a good fit. PCA is a popular method for data whitening. Random projection has also been shown to whiten the data (Dasgupta 2, Deegalla & Bostrom 26). 2.2 MIXTURE OF GAUSSIANS WITH COVARIANCE REGULARIZATION MoG is historically the most used and researched mixture model. Its strength comes from its ability to approximate any density function. Shrinkage is the covariance regularization approach applied in these experiments. Shrinkage has the benefit of being simple and efficient to calculate. Shrinkage estimates, Σ, usually involve a parameter, λ, that balances between the empirical covariance matrix, 2

3 Σ, and some target matrix, T : Σ = λt + ( λ)σ () Two target matrices from (Schäfer & Strimmer 25) are considered in this paper, computed using (2) or (3). T ij = T ij = j if i = j if i j j σii if i = j if i j (2) (3) When λ =, (2) is equivalent to fuzzy kmeans (Mitchell 997) while (3) is equivalent to a MoG with only variance. These two versions will be referred to as FKM and DIAG respectively. 3 METHODOLOGY In these experiments, a MoG is fit to each class using EM (Bishop 27) with the additional shrinkage step, (). For implementation details, see (Elliott 29). Although a mixture of Gaussians is fit to each class in an unsupervised way, supervised learning is still performed by fitting a MoG to each class and then computing the Bayes optimal classification (Mitchell 997): argmax P(c)p(x Θ c). (4) c C Here, P(c) is simply the fraction of the training data that belong to class c and p(x Θ c) is the probability of a data point given the MoG model for class c. 3. DATA SETS Three data sets are used for experimentation: mfeat, isolet, and a set of appearance-based data. The appearance-based image data set consists of three different classes collected from the Internet: cat/dog (2 images), Christmas tree ( images), and sunsets (93 images) (Elliott 29). The cat/dog images have been manipulated to include only the face of the animal and are hand registered using the eyes. The Christmas tree images were chosen to have a tree in the middle and the sunset images were chosen to have a bright middle region and a dark lower region. The multiple features (mfeat) data set (Frank & Asuncion 2) consists of 649 features extracted from 2 samples of handwritten digits. The isolet data set (Frank & Asuncion 2) consists of 67 features are extracted extracted from 7797 samples of 5 subjects speaking each letter of the alphabet twice. 3.2 DESCRIPTION OF EXPERIMENTS and logistic regression (LogReg) (Bishop 27) results are included with each experiment for comparison. Supervised MoG with shrinkage has two experimental parameters: C {, 2, 3}, the number of clusters per class, and λ [, ], the shrinkage parameter. The best combination of experimental parameter values are chosen using cross-validation. Algorithm comparison is summarized by the classification accuracy averaged over a number of random partitions of the data. LogReg and have no experimental parameters. Random projection matrices are obtained via the QR decomposition of a random matrix. The PCA subspace is created from the training data, X. 3

4 4 RESULTS Figure shows the classification accuracy on the appearance-based image data. These algorithms performed similarly on the validation data and the testing data indicating that cross-validation is a good method for choosing parameters. FKM, DIAG, and LogReg perform similarly until the dimension hits 2, at which point LogReg s performance begins to decline while DIAG and FKM enjoy their best performance. This disparity may be a result of the data being multi-modal when in higher dimension. Test data acc Test data acc (a) PCA projected data (b) Random projected data Figure : Classification accuracy on appearance-based image data for both projection methods. As the dimension of the data increases, the performance of DIAG decreases to a level below that of LogReg on the PCA-projected data. DIAG is much more competitive when the data is pre-processed using random projection beating LogReg in all but the smallest dimensions. DIAG s performance drop-off for the PCA-projected data occurs where the number of eigenvectors first exceeds the number of training samples. FKM performs consistently at or near the top when preprocessed using either projection method and, along with LogReg, has nearly identical classification accuracies for both projection methods. Unlike with the randomly-projected data, is able to occasionally run without becoming numerically unstable on the PCA-projected data because the PCA subspace has just enough variance in these dimensions that the covariance matrix used by is nonsingular. Figure 3 shows the results for mfeat and isolet experiments. The difference in performance between LogReg, DIAG, and FKM are much less pronounced with these two data sets for both projection methods. This is most likely a result of the two data sets being much larger and possibly uni-modal (which assists LogReg and ). However, the graphs of the three data sets results have similar features. Random projection show a much more dramatic reduction in performance once too many dimensions are dropped compared to PCA. This is because the PCA subspace dimensions are sorted by how much variance they capture while there is nothing special about the first random dimensions. Also, DIAG s performance eventually drops as an increasing number of noisy PCA subspace dimensions are retained while random projection shows consistent performance across dimensions. In addition to dimension and projection methods, X is also modified for these two data sets but the dominance of FKM over DIAG and LogReg seen with the appearancebased image data is not replicated. In fact, LogReg, due to its much lower number of parameters, is unaffected by lower X. Figure 5, which displays the results for DIAG and FKM for both projection methods 4

5 FKM Test Data Accuracy By Lambda Value DIAG Test Data Accuracy By Lambda Value (a) PCA with FKM (b) PCA with DIAG FKM Test Data Accuracy By Lambda Value DIAG Test Data Accuracy By Lambda Value (c) Random with FKM (d) Random with DIAG Figure 2: Classification accuracy with respect to dimension for both DIAG and FKM versions with several λ values on test data for both projection methods on the appearance-based image data. for various λ values across dimension and X on the mfeat and isolet data sets, shows a similar relationship between λ and classification accuracy where higher λ values are best up to around or.9 and then quickly drops to such an extent that λ of and above are among the worst performing values. Figure 2 shows how the classification accuracies for the two MoG versions are affected by choice of λ for both projection schemes on the appearance-based data. Figures 2a and 2c show a preference toward higher λ values for FKM with the notable exception at λ =. Figure 2b shows a less clear relationship for λ with DIAG on the PCA-projected data. Figure 2c shows the performance of FKM with random projection with λ = rising as the dimensions increasing as the dimension increases and there is no reason to believe that it will not eventually become even with the other λ values. Figure 2d reveals no apparent favorite value for λ by DIAG with random projection while λ = is inconsistent and less desirable than the other values on average. 5 CONCLUSIONS By a very small margin, FKM with λ = and random projection keeping many dimensions (little to no dimensionality reduction) gives the best perfor- 5

6 PCA Proj Test Data RND Proj Test Data PCA Proj Test Data RND Proj Test Data (a) mfeat/pca (b) mfeat/rnd (c) isolet/pca (d) isolet/rnd Figure 3: Classification accuracy on mfeat and isolet data sets for both PCA and RND projection methods. (a) 7% (b) 5% (c) 3% (d) 6% Figure 4: Classification accuracy of various λ values with DIAG on mfeat data set for the PCA projection method. The number beneath each plot is the percentage of the 2 data points used for training. mance. The difference in performance for DIAG as dimension increases between the two projection methods is most likely a result of the later PCA dimensions being uninformative noise. Remember, DIAG is unable to remove any variance and, therefore, noisy dimensions have a more negative effect than for its FKM counterpart. This drop in performance is also present in the training data. At the lower dimensions, random projection pre-processing performs worse than PCA. However, once dimensionality reaches a certain level, the performance of FKM and LogReg reamain steady for both projection methods. Figure 4 shows an odd relationship between the number of PCA dimensions kept and classification accuracy of DIAG as X decreases. As expected, the dimension at which DIAG performance drops off for all λ values decreases along with X. However, as X shrinks, the magnitude of this dip decreases as well. This may be explained by a decreasing X decreasing the specificity of the PCA subspace to the training data. The dropped PCA dimensions for large X are almost pure noise. The PCA subspace dimensions computed form small X will be more general. Therefore, keeping all PCA dims computed from a 6

7 PCA Proj Test Data DIAG PCA Proj Test Data FKM RND Proj Test Data DIAG RND Proj Test Data FKM (a) DIAG/PCA/mfeat (b) FKM/PCA/mfeat (c) DIAG/RND/mfeat (d) FKM/RND/mfeat PCA Proj Test Data DIAG PCA Proj Test Data FKM RND Proj Test Data DIAG RND Proj Test Data FKM (e) DIAG/PCA/isolet (f) FKM/PCA/isolet (g) DIAG/RND/isolet (h) FKM/RND/isolet Figure 5: Classification accuracy of various λ values for both covariance regularization methods, projection methods, and the mfeat and isolet data sets. small X will result in better performance than for large X once D X in part because these later dimensions are now more like random projection. Most importantly, the result that FKM with high λ is consistently at the top in these experiments. This tells us that some covariance information is necessary but a little bit of covariance goes a long way toward being able to classify this data and generalize to unseen data while adding additional covariance primarily increases the risk of over-fitting with little chance of improving classification accuracy. In addition to the promise of projection onto high dimensional, random basis combined with FKM with > λ our results show a simpler relationship between choice of λ and the number of dimensions to keep when projecting using using a random projection. By comparison PCA pre-processing, still the most popular dimensionality reduction technique in many domains, is fussy and its optimal number of dimensions is different for each data set. If a near-optimal number of subspace dimensions is not chosen, there is a great deal of variation in performance between λ values for DIAG. Another option is to apply FKM when pre-processing via PCA projection. However, computation of a 7

8 random basis is much faster than PCA computation. Either way, it appears that it is important to involve a large degree of covariance regularization through either random projection or FKM with high λ values. For all data sets, as the dimension of the randomly generated basis increases, the disparity in performance between the varying levels of covariance regularization decreases. If this trend were to continue, one could expect FKM with λ = to become competitive with all λ values for FKM and DIAG and kmeans could replace MoG. In this case, projecting data into a higher dimension and using kmeans would yield an algorithm with superior accuracy and improved computational complexity. Investigating this further is left to future work. These observations span the three data sets used in this paper which represent use of raw data and features computed from the data and situations where there is sufficient and insuffient training data. Isolet and mfeat may be uni-modal and are a good fit for LogReg when the training set size is diminished. Otherwise the experimental results indicate random projection with little dimensionality reduction and application of FKM with high λ is a safe bet to obtain quality classfication accuracy. References Belhumeur, P., Hespanha, J. & Kriegman, D. (997), Eigenfaces vs. fisherfaces: recognition using class specific linear projection, Pattern Analysis and Machine Intelligence, IEEE Transactions on 9(7), Bishop, C. M. (27), Pattern Recognition and Machine Learning, Springer. Candes, E. & Tao, T. (26), Near-optimal signal recovery from random projections: Universal encoding strategies, IEEE Trans. Inform. Theory (52), Dasgupta, S. (2), Experiments with random projections, in Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence. Deegalla, S. & Bostrom, H. (26), Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification, in ICMLA 6, IEEE Computer Society, Washington, DC, USA, pp Elliott, D. L. (29), Covariance regularization in mixture of gaussians for high-dimensional image classification, Master s thesis, Colorado State University. Frank, A. & Asuncion, A. (2), UCI machine learning repository, Hammoud, R. & Mohr, R. (2), Mixture densities for video objects recognition, International Conference on Pattern Recognition 2, 27. Hinton, G., Dayan, P. & Revow, M. (997), Modelling the manifolds of images of handwritten digits, IEEE transactions on Neural Networks 8(), Mitchell, T. M. (997), Machine Learning, McGraw-Hill, Boston. Moghaddam, B. & Pentland, A. (997), Probabilistic visual learning for object representation, PAMI 9(7), Robinson, J. A. (29), Covariance estimation in full- and reduced-dimensionality image classification, Image Vision Comput. 27(8), Schäfer, J. & Strimmer, K. (25), A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Statistical Applications in Genetics and Molecular Biology 4(). Tipping, M. & Bishop, C. (999), Mixtures of probabilistic principal component analysers, Neural Computation (2), Ye, J. & Wang, T. (26), Regularized discriminant analysis for high dimensional, low sample size data, in KDD 26, ACM, New York, NY, USA, pp

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION Submitted by Daniel L Elliott Department of Computer Science In partial fulfillment of the requirements

More information

Eigenface-based facial recognition

Eigenface-based facial recognition Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

A Unified Bayesian Framework for Face Recognition

A Unified Bayesian Framework for Face Recognition Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and

More information

EM in High-Dimensional Spaces

EM in High-Dimensional Spaces IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 35, NO. 3, JUNE 2005 571 EM in High-Dimensional Spaces Bruce A. Draper, Member, IEEE, Daniel L. Elliott, Jeremy Hayes, and Kyungim

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Recognition Using Class Specific Linear Projection Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Articles Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection, Peter N. Belhumeur,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition. Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA 99, ashington D. C. USA, March 22-2, 1999. Comparative Assessment of Independent Component

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Face Detection and Recognition

Face Detection and Recognition Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Random Sampling LDA for Face Recognition

Random Sampling LDA for Face Recognition Random Sampling LDA for Face Recognition Xiaogang Wang and Xiaoou ang Department of Information Engineering he Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk Abstract Linear Discriminant

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Enhanced Fisher Linear Discriminant Models for Face Recognition

Enhanced Fisher Linear Discriminant Models for Face Recognition Appears in the 14th International Conference on Pattern Recognition, ICPR 98, Queensland, Australia, August 17-2, 1998 Enhanced isher Linear Discriminant Models for ace Recognition Chengjun Liu and Harry

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Lecture 17: Face Recogni2on

Lecture 17: Face Recogni2on Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

W vs. QCD Jet Tagging at the Large Hadron Collider

W vs. QCD Jet Tagging at the Large Hadron Collider W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY ASYMMETRIC PRINCIPAL COMPONENT ANALYSIS

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY ASYMMETRIC PRINCIPAL COMPONENT ANALYSIS IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY 2009 931 Short Papers Asymmetric Principal Component and Discriminant Analyses for Pattern Classification Xudong Jiang,

More information

Lecture 17: Face Recogni2on

Lecture 17: Face Recogni2on Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1! What we will learn today Introduc2on to face recogni2on Principal Component Analysis

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

COS 429: COMPUTER VISON Face Recognition

COS 429: COMPUTER VISON Face Recognition COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:

More information

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

Face Recognition Using Multi-viewpoint Patterns for Robot Vision 11th International Symposium of Robotics Research (ISRR2003), pp.192-201, 2003 Face Recognition Using Multi-viewpoint Patterns for Robot Vision Kazuhiro Fukui and Osamu Yamaguchi Corporate Research and

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Subspace Methods for Visual Learning and Recognition

Subspace Methods for Visual Learning and Recognition This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information