Notes on Implementation of Component Analysis Techniques

Similar documents
Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Multivariate Statistical Analysis

Fisher s Linear Discriminant Analysis

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Regularized Discriminant Analysis and Reduced-Rank LDA

Principal Component Analysis

Linear Dimensionality Reduction

Matrix Vector Products

LECTURE 16: PCA AND SVD

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis (PCA)

Data Mining and Analysis: Fundamental Concepts and Algorithms

Announcements (repeat) Principal Components Analysis

PCA and LDA. Man-Wai MAK

Principal Component Analysis (PCA)

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Exercises * on Principal Component Analysis

Mathematical foundations - linear algebra

Dimensionality Reduction

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

PCA Review. CS 510 February 25 th, 2013

Advanced Introduction to Machine Learning CMU-10715

Introduction to Machine Learning

Principal Component Analysis and Linear Discriminant Analysis

LEC 3: Fisher Discriminant Analysis (FDA)

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Lecture 7 Spectral methods

16.584: Random Vectors

PCA and LDA. Man-Wai MAK

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

15 Singular Value Decomposition

Independent Component Analysis and Its Application on Accelerator Physics

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

A Least Squares Formulation for Canonical Correlation Analysis

PCA, Kernel PCA, ICA

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

Principal Component Analysis

12.2 Dimensionality Reduction

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Discriminant analysis and supervised classification

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Computation. For QDA we need to calculate: Lets first consider the case that

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

ECE 661: Homework 10 Fall 2014

14 Singular Value Decomposition

Machine Learning 11. week

Advanced data analysis

Data Mining and Analysis: Fundamental Concepts and Algorithms

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

GEOG 4110/5100 Advanced Remote Sensing Lecture 15

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Study Notes on Matrices & Determinants for GATE 2017

Lecture: Face Recognition

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Principal Component Analysis

Statistical and Computational Analysis of Locality Preserving Projection

Example: Face Detection

Principal Component Analysis CS498

Eigenvalues, Eigenvectors, and an Intro to PCA

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Machine Learning (Spring 2012) Principal Component Analysis

Manifold Learning: Theory and Applications to HRI

CSE 554 Lecture 7: Alignment

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Eigenfaces. Face Recognition Using Principal Components Analysis

On-line Variance Minimization

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Numerical Methods I Singular Value Decomposition

Clustering VS Classification

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Applications of random matrix theory to principal component analysis(pca)

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Statistical Pattern Recognition

Dimension Reduction and Low-dimensional Embedding

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Eigenvalues and diagonalization

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Foundations of Computer Vision

Methods for sparse analysis of high-dimensional data, II

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Probabilistic Latent Semantic Analysis

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Components Analysis. Sargur Srihari University at Buffalo

Kernel Principal Component Analysis

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

L11: Pattern recognition principles

Adaptive beamforming. Slide 2: Chapter 7: Adaptive array processing. Slide 3: Delay-and-sum. Slide 4: Delay-and-sum, continued

Transcription:

Notes on Implementation of Component Analysis Techniques Dr. Stefanos Zafeiriou January 205 Computing Principal Component Analysis Assume that we have a matrix of centered data observations X = [x µ,..., x N µ] () where µ denotes the mean vector. X has size F N, where F is the number of dimensions and N is the number of observations. Their covariance matrix is given by S t = N N (x i µ)(x i µ) T = N XXT (2) i= In Principal Component Analysis (PCA), we aim to maximize the variance of each dimension by maximizing 0 = arg max tr( T S t ) T = I (3) The solution of Eq. 3 can be derived by solving S t = Λ (4) Thus, we need to perform eigenanalysis on S t. If we want to keep d principal components, the computational cost of the above operation is O(dF 2 ). If F is large, this computation can be quite expensive. Lemma Let us assume that B = XX T and C = X T X. It can be proven that B and C have the same positive eigenvalues Λ and, assuming that N < F, then the eigenvectors U of B and the eigenvectors V of C are related as U = XVΛ 2.

Figure : Example of data whitening using the PCA projection matrix = UΛ 2. Using Lemma we can compute the eigenvectors U of S t in O(N 3 ). The eigenanalysis of X T X is denoted by X T X = VΛV T (5) where V is a N (N ) matrix with the eigenvectors as columns and Λ is a (N ) (N ) diagonal matrix with the eigenvalues. Given that V T V = I and VV T I we have X T X = VΛV T U = XVΛ 2 U T XX T U = Λ 2 V T X T XX T XVΛ 2 = The pseudocode for computing PCA is = Λ 2 V {{ T V Λ V {{ T V Λ V {{ T V Λ 2 = I I I = Λ Algorithm Principal Component Analysis : procedure PCA 2: Compute dot product matrix: X T X = N i= (x i µ) T (x i µ) 3: Eigenanalysis: X T X = VΛV T 4: Compute eigenvectors: U = XVΛ 2 5: Keep specific number of first components: U d = [u,..., u d ] 6: Compute d features: Y = U T d X (6) Now, the covariance matrix of Y is YY T = U T XX T U = Λ The final solution of Eq. 3 is given as the projection matrix = UΛ 2 (7) which normalizes the data to have unit variance (Figure ). This procedure is called whitening (or sphereing). 2

2 Computing Linear Discriminant Analysis As explained before (Section ), PCA finds the principal components that maximize the data variance without taking into account the class labels. In contrast to this, Linear Discriminant Analysis (LDA) computes the linear directions that maximize the separation between multiple classes. This is mathematically expressed as maximizing 0 = arg max tr( T S b ) T S w = I (8) Assume that we have C number of classes, denoted by c i = [x,..., x Nci ], i =,..., C, where each x j has F dimensions and µ(c i ) is the mean vector of the class c i. Thus, the overall data matrix is X = [c,..., c C ] with size F N (N = C i= N c i ) and µ is the overall mean (mean of means). S w is the within-class scatter matrix S w = S j = j= j= x i c j (x i µ(c j ))(x i µ(c j )) T (9) that has rank (S w ) = min (F, N C). Moreover, S b is the between-class scatter matrix S b = N cj (µ(c j ) µ)(µ(c j ) µ) T (0) j= that has rank (S b ) = min (F, C ). The solution of Eq. 8 is given from the generalized eigenvalue problem S b = S w Λ () thus 0 corresponds to the eigenvectors of S w S b that have the largest eigenvalues. In order to deal with the singularity of S w, we can do the following steps:. Perform PCA on our data matrix X to reduce the dimensions to N C using the eigenvectors U 2. Solve LDA on this reduced spaceand get Q that has C columns. 3. Compute the total transform as = UQ. Unfortunately, if you follow the above procedure is possible that important information is solved. In the following, we show how the components of LDA can be computed by applying a simultaneous diagonalization procedure. Properties The scatter matrices have some interesting properties. Let us denote E 0 0 0 E 2 0 M =...... = diag {E, E 2,..., E C (2) 0 0 E C 3

where E i =..... Note that M is idempotent, thus MM = M. Given that the data covariance matrix is S t = XX T, the between-class scatter matrix can be written as and the within-class scatter matrix as (3) S b = XMMX T = XMX T (4) S w = XX {{ T XMX {{ T = X(I M)X T (5) S t S b Thus, we have that S t = S w + S b. Note that since M is idempotent, I M is also idempotent. Given the above properties, the objective function of Eq. 8 can be expressed as 0 = arg max tr( T XMMX T ) T X(I M)(I M)X T = I (6) The optimization of this problem involves a procedure called Simultaneous Diagonalization. Let s assume that the final transform matrix has the form = UQ (7) e aim to find the matrix U that diagonalizes S w = X(I M)(I M)X T. This practically means that, given the constraint of Eq. 6, we want T X(I M)(I M)X T = I Q T U T X(I M)(I M)X T U Q = I (8) {{ I Consequently, using Eqs. 7 and 8, the objective function of Eq. 6 can be further expressed as Q 0 = arg max tr(q T U T XMMX T UQ) Q (9) Q T Q = I where the constraint T X(I M)(I M)X T = I now has the form Q T Q = I. Lemma 2 Assume the matrix X(I M)(I M)X T = X w X T w, where X w is the F N matrix X w = X(I M). By performing eigenanalysis on X T w X w as X T w X w = 4

V w ΛV w T, we get N C positive eigenvalues, thus V w is a N (N C) matrix. The optimization problem of Eq. 9 can be solved in two steps. Find U such that U T X(I M)(I M)X T U = I. By applying Lemma 2, we get U = X w V w Λ w. Note that U has size F (N C). 2. Find Q 0. By denoting X b = U T XM the (N C) N matrix of projected class means, Eq. 9 becomes Q 0 = arg max Q tr(q T X b X T b Q) Q T Q = I (20) which is equivalent to applying PCA on the matrix of projected class means. The final Q 0 is a matrix with columns the d eigenvectors of X b X T b that correspond to the d largest eigenvalues (d C ). The final projection matrix is given by Based on the above, the pseudocode for computing LDA is 0 = Q 0 U (2) Algorithm 2 Linear Discriminant Analysis : procedure LDA 2: Find eigenvectors of S w that correspond to non-zero eigenvalues (usually N C), i.e. U = [u,..., u N C ] by performing eigen-analysis to (I M)X T X(I M) = V w Λ w V T w and computing U = X(I M)V w Λ w (performing whitening on S w ). 3: Project the data as X b = U T XM. 4: Perform PCA on X b to find Q (i.e., compute the eigenanalysis of X b X T b = QΛ b Q T ). 5: The total transform is = UQ Locality preserving projections can be computed in a similar fashion. The first step is to perform whitening of XDX T. e do so by applying Lemma and performing eigenanalysis of D /2 X T XD /2 = V p Λ p V p T. Then, the whitening transform is given by U T = XD /2 Λ p /2. The next step is to project the data as X p = U T X and find the eigenvectors Q of X p (D S) X T p that correspond to the lowest (but non-zero eigenvalues). Then the total transform would be = UQ. here S is the connectivity matrix defined in the slides 5