PCA and LDA. Man-Wai MAK

Size: px
Start display at page:

Download "PCA and LDA. Man-Wai MAK"

Transcription

1 PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University mwmak References: S.J.D. Prince,Computer Vision: Models Learning and Inference, Cambridge University Press, 2012 C. Bishop, Pattern Recognition and Machine Learning, Appendix E, Springer, October 26, 2018 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

2 Overview 1 Dimension Reduction Why Dimension Reduction Dimension Reduction: Reduce to 1-Dim 2 Principle Component Analysis Derivation of PCA PCA on High-Dimensional Data Eigenface 3 Linear Discriminant Analysis LDA on 2-Class Problems LDA on Multi-class Problems Man-Wai MAK (EIE) PCA and LDA October 26, / 26

3 Why Dimension Reduction Many applications produce high-dimensional vectors In face recognition, if an image has size pixels, the dimension is In hand-writing digit recognition, if a digit occupies pixels, the dimension is 784. In speaker recognition, the dim can be as high as per utterance. High-dim feature vectors can easily cause the curse-of-dimensionality problem. Redundancy: Some of the elements in the feature vectors are strongly correlated, meaning that knowing one element will also know some other elements. Irrelevancy: Some elements in the feature vectors are irrelevant to the classification task. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

4 Dimension Reduction Given a feature vector x R D, dimensionality reduction aims to find a low dimensional representation h R M that can approximately explain x: x f(h, θ) (1) where f(, ) is a function that takes the hidden variable h and a set of parameters θ and M D. Typically, we choose the function family f(, ) and then learn h and θ from training data. Least squares criterion: Given N training vectors X = {x 1,..., x N }, x i R D, we find the parameters θ and latent variables h that minimize the sum of squared error: { N } ˆθ, {ĥi} N i=1 = argmin θ,{h i } N i=1 i=1 [x i f(h i, θ)] T [x i f(h i, θ)] (2) Man-Wai MAK (EIE) PCA and LDA October 26, / 26

5 Dimension Reduction: Reduce to 1-Dim Approximate vector x i by a scalar value h i plus the global mean µ: x i φh i + µ, where µ = 1 N N x i, φ R D 1 i=1 Assuming µ = 0 or vectors have been mean-subtracted, i.e., x i x i µ i, we have x i φh i The least squares criterion becomes: ˆφ, {ĥi} N i=1 = argmin E(φ, {h i }) φ,{h i } N i=1 = argmin φ,{h i } N i=1 { N } (3) [x i φh i ] T [x i φh i ] i=1 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

6 Dimension Reduction: Reduce to 1-Dim Eq. 3 has a problem in that it does not have a unique solution. If we multiply φ by any constant α and divide h i s by the same constant we get the same cost, i.e., αφ hi α = φh i. We make the solution unique by constraining φ 2 = 1 using a Lagrange multiplier: L(φ, {h i }) = E(φ, {h i }) + λ(φ T φ 1) N = (x i φh i ) T (x i φh i ) + λ(φ T φ 1) = i=1 N x T x i 2h i φ T x i + h 2 i + λ(φ T φ 1) i=1 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

7 Dimension Reduction: Reduce to 1-Dim Setting L L φ = 0 and h i = 0, we obtain: i x iĥi = λˆφ and ˆφT xi = ĥi = x T i ˆφ Hence, i x i ( x T ˆφ ) ( ) i = x ix T i i ˆφ = λˆφ = Sˆφ = λˆφ where S is the covariance matrix of training data. 1 Therefore, ˆφ is the first eigenvector of S. 1 Note that x i s have been mean subtracted. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

8 Dimension Reduction: Reduce to 1-Dim 13 Image preprocessing and feature extraction a) c) b) Figure Reduction to a single dimension. a) Original data and direction of maximum variance. b) The data are projected onto to produce a one dimensional representation. c) To reconstruct the data, we re-multiply by. Most of the original variation is retained. PCA extends this model to project high dimensional data onto the K orthogonal dimensions with the most variance, to produce a K dimensional representation. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

9 find a small number of axes in which the data have Dimension the highest Reduction: variability. 3D to 2D The axes may not parallel to the original axes. E.g., projec@on from 3D to 2D space x 3 x 2 x 1 5 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

10 Principle Component Analysis In PCA, the hidden variables {h i } are multi-dimensional and φ becomes a rectangular matrix Φ = [φ 1 φ 2 φ M ], where M D. Each components of h i weights one column of matrix Φ so that data is approximated as x i Φh i, i = 1,..., N The cost function is 2 ˆΦ, {ĥi} N i=1 = argmin Φ,{h i } N i=1 = argmin Φ,{h i } N i=1 E ( Φ, {h i } N ) i=1 { N } (4) [x i Φh i ] T [x i Φh i ] i=1 2 Note that we have defined θ Φ in Eq. 2. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

11 Principle Component Analysis To solve the non-uniqueness problem in Eq. 4, we enforce φ T d φ d = 1, d = 1,..., M, using a set of Lagrange multipliers {λ d } M d=1 : L(Φ, {h i }) = = = N M (x i Φh i ) T (x i Φh i ) + λ d (φ T d φ d 1) i=1 d=1 N (x i Φh i ) T (x i Φh i ) + tr{φλ M Φ T Λ} i=1 N x T x i 2h T i Φ T x i + h T i h i + tr{φλ M Φ T Λ} i=1 (5) where h i R M, Λ = diag{λ 1,..., λ M, 0,..., 0} R D D, Λ M = diag{λ 1,..., λ M } R M M, and Φ = [φ 1 φ 2 φ M ] R D M. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

12 Principle Component Analysis Setting L L Φ = 0 and h i = 0, we obtain: i x iĥt i = ˆΦΛ M and ˆΦT xi = ĥi = ĥt i = x T i ˆΦ where we have used: X tr{xbxt } = XB T + XB and a T X T b X = bat. Therefore, i x ix T i ˆΦ = ˆΦΛ M = S ˆΦ = ˆΦΛ M (6) So, ˆΦ comprises the M eigenvectors of S. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

13 Interpretation of Λ M Denote X as a D N centered data matrix whose n-th column is given by (x n 1 N N i=1 x i). The projected data matrix is given by Y = ˆΦ T X The covariance matrix of the transformed data is ( ) ( ) T YY T = ˆΦT X ˆΦT X = ˆΦ T XX T ˆΦ = ˆΦ T ˆΦΛM (see the eigen-equation in Eq. 6) = Λ M Therefore, the eigenvalues represent the variance of the projected vectors. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

14 PCA on High-Dimensional Data When, the dimension D of x i is very high, computing S and its eigenvectors directly are impractical. However, the rank of S is limited by the number of training examples: If there are N training examples, there will be at most N 1 eigenvectors with non-zero eigenvalues. If N D, the principal components can be computed more easily. Let X be a data matrix comprising the mean-subtracted x i s in its columns. Then, S = XX T and the eigen-decomposition of S is given by Sφ i = XX T φ i = λ i φ i Instead of performing eigen-decomposition of XX T, we perform eigen-decomposition of X T Xψ i = λ i ψ i (7) Man-Wai MAK (EIE) PCA and LDA October 26, / 26

15 Principle Component Analysis Pre-multipling both side of Eq. 7 by X, we obtain XX T (Xψ i ) = λ i (Xψ i ) This means that if ψ i is an eigenvector of X T X, then φ i = Xψ i is an eigenvector of S = XX T. So, all we need is to compute the N 1 eigenvectors of X T X, which has size N N. Note that φ i computed in this way is un-normalized. So, we need to normalize them by φ i = Xψ i Xψ i, i = 1,..., N 1 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

16 Example Application of PCA: Eigenface Eigenface is one of the most well-known applications of PCA. µ φ 1 φ 2 h φ 10 h 10 1 h... 2 h Reconstructed faces using 399 eigenfaces Original faces Man-Wai MAK (EIE) PCA and LDA October 26, / 26

17 Example Application of PCA: Eigenface Faces reconstructed using different numbers of principal components (eigenfaces): Original 1 PC 20 PCs 50 PCs 100 PCs 200 PCs 399 PCs See Lab2 of EIE4105 in mwmak/myteaching.htm for implementation. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

18 Limitations of PCA Limita&ons of PCA PCA will will fail fail if the if the subspace subspace is non-linear is non-linear PCA can only find this Linear subspace (PCA is fine) Nonlinear subspace (PCA fails) Solution: Use non-linear embedding such as ISOMAP or DNN Solu9ons: Using non-linear embedding such as ISOMAP or DNN 15 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

19 Fisher Discriminant FDA a classifica-on Analysis method to separate data into two classes. FDABut is afda classification could also method be toconsidered separate dataas intoa two supervised classes. FDAdimension could also bereduc-on considered as method a supervised that dimension reduces reduction the method dimension that reduces to 1. the dimension to 1. Project data onto line joining the 2 means Project data onto FDA subspace 15 Man-Wai MAK (EIE) PCA and LDA October 26, / 26

20 Fisher Discriminant Analysis The idea of FDA is to find a 1-D line so that the projected data give a large separation between the means of two classes while also giving a small variance within each class, thereby minimizing the class overlap. Assume that training data are projected onto a 1-D space using Fisher criterion: where J(w) = y n = w T x n, n = 1,..., N. Between-class scatter Within-class scatter S B = (µ 2 µ 1 )(µ 2 µ 1 ) T and S W = = wt S B w w T S W w 2 k=1 n C k (x n µ k )(x n µ k ) T are the between-class and within-class scatter matrices, respectively, and µ 1 and µ 2 are the class means. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

21 Fisher Discriminant Analysis Note that only the direction of w matters. Therefore, we can always find a w that leads to w T S W w = 1. The maximization of J(w) can be rewritten as: The Lagrangian function is Setting L w = 0, we obtain max w w T S B w subject to w T S W w = 1 L(w, λ) = 1 2 wt S B w λ(w T S W w 1) S B w λs W w = 0 = S B w = λs W w = (S 1 W S B)w = λw So, w is the first eigenvector of S 1 W S B. Man-Wai MAK (EIE) PCA and LDA October 26, / 26 (8)

22 LDA on Multi-class Problems For multiple classes (K > 2 and D > K), we can use LDA to project D-dimensional vectors to M-dimensional vectors, where 1 < M < K. w is extended to a matrix W = [w 1 w M ] and the projected scalar y i is extended to a vector y i : y n = W T (x n µ), where y nj = w T j (x n µ), j = 1,..., M where µ is the global mean of training vectors. The between-class and within-class scatter matrices become S B = S W = K N k (µ k µ)(µ k µ) T k=1 K k=1 n C k (x n µ k )(x n µ k ) T where N k is the number of samples in the class k, i.e., N k = C k. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

23 LDA on Multi-class Problems The LDA criterion function: J(W) = Between-class scatter Within-class scatter { ( ) ( ) } 1 = Tr W T S B W W T S W W Constrained optimization: max W Tr{W T S B W} subject to W T S W W = I where I is an M M identity matrix. Note that unlike PCA in Eq. 5, because of the matrix S W in the constraint, we need to find one w j at a time. Note also that the constraint W T S W W = I suggests that w j s may not be orthogonal to each other. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

24 LDA on Multi-class Problems To find w j, we write the Lagrangian function as: L(w j, λ j ) = w T j S B w j λ j (w T j S W w j 1) Using Eq. 8, the optimal solution of w j satisfies (S 1 W S B)w j = λ j w j Therefore, W comprises the first M eigenvectors of S 1 W S B. A more formal proof can be find in [1]. As the maximum rank of S B is K 1, S 1 W S B has at most K 1 non-zero eigenvalues. As a result, M can be at most K 1. After the projection, the vectors y n s can be used to train a classifier (e.g., SVM) for classification. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

25 PCA vs. LDA LDA Projec+on: HCI Example Project 784-dim images to 3-dim LDA subspace formed by the 3 eigenvectors with the largest eigenvalues, i.e., W is a 784 x 3 matrix or W R Project 784-dim vectors derived from handwritten digits to 3-D space: LDA PCA Man-Wai MAK (EIE) PCA and LDA October 26, / 26

26 References [1] Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. San Diego, California, USA: Academic Press. Man-Wai MAK (EIE) PCA and LDA October 26, / 26

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

LEC 3: Fisher Discriminant Analysis (FDA)

LEC 3: Fisher Discriminant Analysis (FDA) LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Eigenfaces. Face Recognition Using Principal Components Analysis

Eigenfaces. Face Recognition Using Principal Components Analysis Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR

More information

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fisher Linear Discriminant Analysis Cheng Li, Bingyu Wang August 31, 2014 1 What s LDA Fisher Linear Discriminant Analysis (also called Linear Discriminant Analysis(LDA)) are methods used in statistics,

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

LECTURE NOTE #10 PROF. ALAN YUILLE

LECTURE NOTE #10 PROF. ALAN YUILLE LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Face Recognition and Biometric Systems

Face Recognition and Biometric Systems The Eigenfaces method Plan of the lecture Principal Components Analysis main idea Feature extraction by PCA face recognition Eigenfaces training feature extraction Literature M.A.Turk, A.P.Pentland Face

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Eigenface-based facial recognition

Eigenface-based facial recognition Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Face recognition Computer Vision Spring 2018, Lecture 21

Face recognition Computer Vision Spring 2018, Lecture 21 Face recognition http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 21 Course announcements Homework 6 has been posted and is due on April 27 th. - Any questions about the homework?

More information

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

Image Analysis. PCA and Eigenfaces

Image Analysis. PCA and Eigenfaces Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana

More information

Some Interesting Problems in Pattern Recognition and Image Processing

Some Interesting Problems in Pattern Recognition and Image Processing Some Interesting Problems in Pattern Recognition and Image Processing JEN-MEI CHANG Department of Mathematics and Statistics California State University, Long Beach jchang9@csulb.edu University of Southern

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg IN 5520 30.10.18 Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg (anne@ifi.uio.no) 30.10.18 IN 5520 1 Literature Practical guidelines of classification

More information

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis 10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Notes on Implementation of Component Analysis Techniques

Notes on Implementation of Component Analysis Techniques Notes on Implementation of Component Analysis Techniques Dr. Stefanos Zafeiriou January 205 Computing Principal Component Analysis Assume that we have a matrix of centered data observations X = [x µ,...,

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information