Principal Component Analysis (PCA)

Similar documents
What is Principal Component Analysis?

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Principal Component Analysis

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Principal Component Analysis (PCA)

Probabilistic Latent Semantic Analysis

Singular Value Decomposition

1 Principal Components Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Numerical Methods I Singular Value Decomposition

MLCC 2015 Dimensionality Reduction and PCA

Deriving Principal Component Analysis (PCA)

Advanced Introduction to Machine Learning CMU-10715

The Mathematics of Facial Recognition

Example: Face Detection

Lecture: Face Recognition and Feature Reduction

Face Recognition and Biometric Systems

STA 414/2104: Lecture 8

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Main matrix factorizations

Data Mining Techniques

Lecture: Face Recognition and Feature Reduction

Machine Learning - MT & 14. PCA and MDS

Exercises * on Principal Component Analysis

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis

Dimension Reduction and Low-dimensional Embedding

Lecture 7: Con3nuous Latent Variable Models

Introduction to Machine Learning

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

LECTURE NOTE #11 PROF. ALAN YUILLE

Principal Component Analysis and Linear Discriminant Analysis

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

PCA, Kernel PCA, ICA

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Eigenfaces. Face Recognition Using Principal Components Analysis

STA 414/2104: Lecture 8

Dimensionality Reduction and Principle Components Analysis

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Principal Component Analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Independent Component Analysis and Its Application on Accelerator Physics

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Machine Learning (Spring 2012) Principal Component Analysis

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

The Singular Value Decomposition

Unsupervised Learning: K- Means & PCA

Machine Learning (BSMC-GA 4439) Wenke Liu

Principal Component Analysis

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Dimensionality Reduction with Principal Component Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture: Face Recognition

CITS 4402 Computer Vision

PCA and admixture models

PCA and LDA. Man-Wai MAK

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Computation. For QDA we need to calculate: Lets first consider the case that

Notes on Latent Semantic Analysis

Multivariate Statistical Analysis

Eigenimaging for Facial Recognition

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Probabilistic & Unsupervised Learning

Maximum variance formulation

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Face Detection and Recognition

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Principal Components Analysis

A Least Squares Formulation for Canonical Correlation Analysis

PCA and LDA. Man-Wai MAK

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Covariance and Correlation Matrix

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

LECTURE 16: PCA AND SVD

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

STA 414/2104: Machine Learning

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Dimensionality Reduction

Basics of Multivariate Modelling and Data Analysis

1 Singular Value Decomposition and Principal Component

Principal Component Analysis

EECS 275 Matrix Computation

Independent Component Analysis

Transcription:

Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2]

Outline Introduction Principle Algorithms Exemplar Applications Relevant Issues Conclusions 2

Introduction Principal component analysis (PCA) A method for high-dimensional data analysis via redundancy reduction Identifying an optimal low-dimensional linear projection; maximum data variance in the new space Useful for data visualization, compression and feature extraction PCA finds a new coordinate system of maximum data variance Projection to principal axis to lead to a new low-dimensional representation 3

Principle Finding 1 st principal component Given a data set of N data points in a d-dimensional space,,, } X = { x 1 x N 4

Principle Finding 1 st principal component (cont.) 5

Principle General formulation We want to find M (M < d) principal components. So we need λ i < λ j > if i j, ( δ ij = 1 if i = j and 0 otherwise) 6

Principle Data reconstruction after dimension reduction From a compressed data point in the M-dimensional PCA space (M < d ), we can reconstruct the data point in the original d-dimensional space 7

Principle Perspective of minimizing reconstruction errors Also PCA formulated from a perspective of minimizing reconstruction errors 8

Principle Dual PCA Idea For a d x N (d >>N) matrix, X, dimensionality of this linear space < N 1 T S = N XX is d x d matrix; it is often computationally infeasible in solving its eigenvalue problem 1 S = N X T X is N x N matrix and hence eigenvalue problem is solvable. Fortunately, we can prove S and S share the same eigenvalues! If we achieve an eigenvector of S, we can use it to produce the corresponding eigenvector of S where v eigenvector of i u = i 1 Xv Nλ isan eigenvector of S' and u is its corresponding S, which shares the eigenvaule λ. i i i i 9

Principle Singular Value Decomposition (SVD) For a d x N matrix, X, it can be decomposed into the following form: X = UΣ T V U is a d x d orthogonal matrix, column i is the ith eigenvector of Σ is an d x N diagonal matrix, σ ii = λi, σ ii σ if i < jj j T XX V T is an N x N orthogonal matrix, column i is the ith eigenvector X T X Link to PCA If we make all data centralized by subtracting the mean, is the covariance matrix of X Column i in U corresponds to the ith principal component Properties of SVD allow us to deal with a high dimensional data set of few data points (i.e., d >> N) as it does not use a covariance matrix directly. XX T / N 10

Data Centralization Basic Algorithm For a given data set d N, X, ( d < N), subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by ˆX. Eigenanalysis S = X ˆ X ˆ calculate T / N Finding principal components, finding out all d eigenvaules, ranking them so that λ 1 λ d u,, u., and their corresponding eigenvectors, Selecting top M (M < d) largest eigenvectors of S to form a project matrix U M = [ u1,, u M ], 1 λ λ M 1 d x Encoding data point z = U T M( x x) z is a M-dimensional vector encoding a data point x. Reconstructing data point (Decoding) x = x + U M z x is a d-dimensional vector for the data point x. 11

Data Centralization Dual Algorithm For a given data set d N, X, ( d N), subtracting the mean vector for all the instances in X to achieve the centralized data set denoted by ˆX. SVD Procedure calculate Y = ˆX T / N T T d x d matrix V (i.e., Y = UΣV ). Finding principal components and applying the SVD to Y. Then we achieve a Selecting first M (M < d) columns of V to form a project matrix U M T T = [ v 1,, v M ] x Encoding data point z = U T M( x x) z is a M-dimensional vector encoding a data point x. Reconstructing data point (Decoding) x = x + U M z x is a d-dimensional vector for the data point x. 12

Examples Example 1: Synthetic data 13

Examples Example 2: visualization of high-dimensional data PCA application to visualization of microarray data 14

Examples Example 3: data compression A hand-written digit 3 data set of 600 images, 100 x 100 =10,000 pixels

Examples Example 3: data compression A hand-written digit 3 data set of 600 images, 100 x 100 =10,000 pixels Original images Principal components Reconstructed images

Examples Example 4: feature extraction Extract silent features Eigenface from facial image to facilitate recognition.

Examples Example 4: feature extraction (cont.) Eigenfaces are the eigenvectors of the covariance matrix of the vector space of human faces. A human face may be considered to be a combination of these standard faces the principal eigenface looks like a bland androgynous average human face

Examples Example 4: feature extraction (cont.) When properly weighted, eigenfaces can be summed together to create an approximate face Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces Suppose we are going to use M eigenfaces. Then a facial image will be represented by M coordinates in the PCA subspace. z = U T M x 2 U : d M M x : a vector of d matrix consisting of top M eigenvctors 2 elements converting froman image z : a vector of M elements to be a representation (features) Feature vectors of M elements will be used in a face recognition system for both training and testing

Relevant Issues How to find an appropriate dimensionality, M, in the PCA space We use Proportion of Variance (PoV) to determine it in practice PoV = k i= 1 d i= 1 λ i λ i = λ1 + + λk λ + + λ + + λ When PoV >= 90%, the corresponding k will be assigned to be M. 1 k d 20

Relevant Issues Limitations of the standard PCA Are dimensions of maximum data variance always the relevant dimensions for preservation? Other techniques are required! Relevance component analysis (RCA) Linear discriminative analysis (LDA) 21

Relevant Issues Limitations of the standard PCA (cont.) Should the goal be finding independent rather than pair-wise uncorrelated/orthogonal dimensions? Another technique is required! Independent Component Analysis (ICA) PCA ICA 22

Relevant Issues Limitations of the standard PCA (cont.) The reduction of dimensions for complex distributions may need nonlinear processing Nonlinear PCA extension preserves the proximity between the points in the input space; i.e., local topology of the distribution Enables to unfold some varieties in the input data Keep the local topology Nonlinear projection of a spiral Nonlinear projection of a horseshoe 23

Relevant Issues Miscellaneous PCA extensions (>100) Probabilistic PCA 2-D PCA Sparse PCA/Scaled PCA Nonnegative Matrix Factorization PCA mixture and local PCA Principal Curve and Surface Analysis Kernel PCA 24

Conclusions PCA is a simple yet popular method for handling high dimensional data and inspires many other methods. It is a linear method for dimensionality reduction by projecting original data to a new coordinate system to maximize data variance. PCA can be interpreted from various perspectives and therefore leads to different formulation methods. There are a number of limitations in the standard PCA. There are several variants or extensions, which tends to overcome the limitations of the standard PCA. 25