What is Principal Component Analysis?

Similar documents
Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Principal Component Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

PCA, Kernel PCA, ICA

Unsupervised Learning: K- Means & PCA

Principal Component Analysis (PCA)

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Principal Components Analysis (PCA)

Principal Component Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Deriving Principal Component Analysis (PCA)

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Machine Learning - MT & 14. PCA and MDS

Advanced Introduction to Machine Learning CMU-10715

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Eigenfaces. Face Recognition Using Principal Components Analysis

Lecture 19, November 19, 2012

1 Principal Components Analysis

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Face Detection and Recognition

Computation. For QDA we need to calculate: Lets first consider the case that

Expectation Maximization

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Example: Face Detection

7. Variable extraction and dimensionality reduction

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

Dimensionality Reduction

Singular Value Decomposition and its. SVD and its Applications in Computer Vision

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Introduction to Machine Learning

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Image Analysis. PCA and Eigenfaces

Principal Component Analysis

The Mathematics of Facial Recognition

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Announcements (repeat) Principal Components Analysis

Principal Component Analysis

PCA FACE RECOGNITION

Eigenimaging for Facial Recognition

Face Recognition and Biometric Systems

Dimensionality reduction

Feature Extraction Techniques

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Machine learning for pervasive systems Classification in high-dimensional spaces

Principal Component Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA)

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Linear Subspace Models

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi. University of Virginia

CSC 411 Lecture 12: Principal Component Analysis

PCA and admixture models

Lecture: Face Recognition and Feature Reduction

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

MLCC 2015 Dimensionality Reduction and PCA

LECTURE 16: PCA AND SVD

Principal Component Analysis and Linear Discriminant Analysis

Lecture: Face Recognition

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

BIOINFORMATICS Datamining #1

Numerical Methods I Singular Value Decomposition

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Covariance and Correlation Matrix

November 28 th, Carlos Guestrin 1. Lower dimensional projections

Principal Component Analysis

Regularized Discriminant Analysis and Reduced-Rank LDA

Covariance and Principal Components

Lecture VIII Dim. Reduction (I)

Lecture: Face Recognition and Feature Reduction

EECS 275 Matrix Computation

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

PRINCIPAL COMPONENT ANALYSIS

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

STATISTICAL LEARNING SYSTEMS

Machine Learning (Spring 2012) Principal Component Analysis

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

PCA and LDA. Man-Wai MAK

Principal component analysis (PCA) for clustering gene expression data

Probabilistic & Unsupervised Learning

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Linear Algebra Methods for Data Mining

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

PRINCIPAL COMPONENTS ANALYSIS

PCA and LDA. Man-Wai MAK

Lecture 7: Con3nuous Latent Variable Models

Tutorial on Principal Component Analysis

Dimensionality Reduction

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Exercises * on Principal Component Analysis

Face detection and recognition. Detection Recognition Sally

Transcription:

What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most of the sample's information. Useful for the compression and classification of data. By information we mean the variation present in the sample, given by the correlations between the original variables. 18

Principal Component Analysis Most common form of factor analysis The new variables/dimensions are linear combinations of the original ones are uncorrelated with one another Orthogonal in original dimension space capture as much of the original variance in the data as possible are called Principal Components are ordered by the fraction of the total information each retains 19

Some Simple Demos http://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.ht ml 20

Original Variable B What are the new axes? PC 2 PC 1 Original Variable A Orthogonal directions of greatest variance in data Projections along PC1 discriminate the data most along any one axis 21

Principal Components First principal component is the direction of greatest variability (covariance) in the data Second is the next orthogonal (uncorrelated) direction of greatest variability So first remove all the variability along the first component, and then find the next direction of greatest variability And so on 22

Principal Components Analysis (PCA) Principle Linear projection method to reduce the number of parameters Transfer a set of correlated variables into a new set of uncorrelated variables Map the data into a space of lower dimensionality Form of unsupervised learning Properties It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables New axes are orthogonal and represent the directions with maximum variability 23

Computing the Components Data points are vectors in a multidimensional space Projection of vector x onto an axis (dimension) u is u.x Direction of greatest variability is that in which the average square of the projection is greatest I.e. u such that E((u.x) 2 ) over all x is maximized (we subtract the mean along each dimension, and center the original axis system at the centroid of all data points, for simplicity) This direction of u is the direction of the first Principal Component 24

Computing the Components E((u T x) 2 ) = E ((u T x) (u T x)) = E (u T x.x T u) The matrix C = x.x T contains the correlations (similarities) of the original axes based on how the data values project onto them So we are looking for w that maximizes u T Cu, subject to u being unit-length It is maximized when w is the principal eigenvector of the matrix C, in which case u T Cu = u T lu = l if u is unit-length, where l is the principal eigenvalue of the covariance matrix C The eigenvalue denotes the amount of variability captured along that dimension 25

Why the Eigenvectors? Maximise u T xx T u s.t u T u = 1 Construct Langrangian u T xx T u λu T u Vector of partial derivatives set to zero xx T u λu = (xx T λi) u = 0 As u 0 then u must be an eigenvector of xx T with eigenvalue λ 26

Singular Value Decomposition The first root is called the prinicipal eigenvalue which has an associated orthonormal (u T u = 1) eigenvector u Subsequent roots are ordered such that λ 1 > λ 2 > > λ M with rank(d) non-zero values. Eigenvectors form an orthonormal basis i.e. u it u j = δ ij The eigenvalue decomposition: C = xx T = UΣU T where U = [u 1, u 2,, u M ] and Σ = diag[λ 1, λ 2,, λ M ] Similarly the eigenvalue decomposition of x T x = VΣV T The SVD is closely related to the above x=u Σ 1/2 V T The left eigenvectors U, right eigenvectors V, singular values = square root of eigenvalues. 27

Computing the Components Similarly for the next axis, etc. So, the new axes are the eigenvectors of the covariance matrix of the original variables, which captures the similarities of the original variables based on how data samples project to them Geometrically: centering followed by rotation Linear transformation 28

PCs, Variance and Least-Squares The first PC retains the greatest amount of variation in the sample The k th PC retains the k th greatest fraction of the variation in the sample The k th largest eigenvalue of the correlation matrix C is the variance in the sample along the k th PC The least-squares view: PCs are a series of linear least squares fits to a sample, each orthogonal to all previous ones 29

How Many PCs? For n original dimensions, correlation matrix is nxn, and has up to n eigenvectors. So n PCs. Where does dimensionality reduction come from? 30

Variance (%) Dimensionality Reduction Can ignore the components of lesser significance. 25 20 15 10 5 0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 You do lose some information, but if the eigenvalues are small, you don t lose much p dimensions in original data calculate p eigenvectors and eigenvalues choose only the first d eigenvectors, based on their eigenvalues final data set has only d dimensions 31

Eigenvectors of a Correlation Matrix 32

Geometric picture of principal components z 1 the 1 st PC z 1 is a minimum distance fit to a line in X space z 2 the 2 nd PC is a minimum distance fit to a line in the plane perpendicular to the 1 st PC PCs are a series of linear least squares fits to a sample, each orthogonal to all the previous. 33

Optimality property of PCA Dimension reduction Original data X G T X p n d n G T X X d n G( G Reconstruction T X ) p n T G d p X p n Y G T X d n X p n G p d 34

Optimality property of PCA Main theoretical result: The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem: min G p d X G( G T X ) 2 F subject tog T G I d X X 2 F reconstruction error PCA projection minimizes the reconstruction error among all linear projections of size d. 35

Applications of PCA Eigenfaces for recognition. Turk and Pentland. 1991. Principal Component Analysis for clustering gene expression data. Yeung and Ruzzo. 2001. Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. Lilien. 2003. 36

PCA applications -Eigenfaces the principal eigenface looks like a bland and rogynous average human face http://en.wikipedia.org/wiki/image:eigenfaces.png 37

Eigenfaces Face Recognition When properly weighted, eigenfaces can be summed together to create an approximate grayscale rendering of a human face. Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces Hence eigenfaces provide a means of applying data compression to faces for identification purposes. Similarly, Expert Object Recognition in Video 38

PCA for image compression d=1 d=2 d=4 d=8 d=16 d=32 d=64 d=100 Original Image 39