Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Similar documents
Eigenfaces. Face Recognition Using Principal Components Analysis

Lecture: Face Recognition

A Unified Bayesian Framework for Face Recognition

Example: Face Detection

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

Face Detection and Recognition

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

L11: Pattern recognition principles

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Lecture 17: Face Recogni2on

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Enhanced Fisher Linear Discriminant Models for Face Recognition

Reconnaissance d objetsd et vision artificielle

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Lecture 17: Face Recogni2on

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Face detection and recognition. Detection Recognition Sally

Image Analysis. PCA and Eigenfaces

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Eigenface-based facial recognition

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Lecture 13 Visual recognition

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

A Unified Framework for Subspace Face Recognition

What is Principal Component Analysis?

201 Broadway 20 Ames St. E (corresponding to variations between dierent. and P (j E ) which are derived from training data

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY ASYMMETRIC PRINCIPAL COMPONENT ANALYSIS

Statistical Pattern Recognition

ECE 661: Homework 10 Fall 2014

Discriminant Uncorrelated Neighborhood Preserving Projections

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1

PCA FACE RECOGNITION

Advanced Introduction to Machine Learning CMU-10715

Computation. For QDA we need to calculate: Lets first consider the case that

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Appears in: The 3rd IEEE Int'l Conference on Automatic Face & Gesture Recognition, Nara, Japan, April 1998

Linear Subspace Models

PCA and LDA. Man-Wai MAK

CITS 4402 Computer Vision

Face Recognition. Lecture-14

Pattern Recognition 2

Regularized Discriminant Analysis and Reduced-Rank LDA

Bilinear Discriminant Analysis for Face Recognition

Discriminant analysis and supervised classification

COS 429: COMPUTER VISON Face Recognition

Principal Component Analysis

PCA and LDA. Man-Wai MAK

Kazuhiro Fukui, University of Tsukuba

Evolutionary Pursuit and Its Application to Face Recognition

Two-Dimensional Linear Discriminant Analysis

Principal component analysis

Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis

Face Recognition Using Eigenfaces

Face recognition Computer Vision Spring 2018, Lecture 21

Random Sampling LDA for Face Recognition

Template-based Recognition of Static Sitting Postures

The Mathematics of Facial Recognition

Automatic Identity Verification Using Face Images

Eigenimaging for Facial Recognition

1 Principal Components Analysis

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Model-based Characterization of Mammographic Masses

Motivating the Covariance Matrix

COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS

Fisher Linear Discriminant Analysis

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Principal Component Analysis (PCA)

Eigenfaces and Fisherfaces

Dimension Reduction and Component Analysis

Deriving Principal Component Analysis (PCA)

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Two-Layered Face Detection System using Evolutionary Algorithm

Locality Preserving Projections

Introduction to Machine Learning

Lecture 7: Con3nuous Latent Variable Models

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Face Recognition. Lauren Barker

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

2D Image Processing Face Detection and Recognition

Face Recognition. Lecture-14

Eigenvalues, Eigenvectors, and an Intro to PCA

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Dimension Reduction and Component Analysis Lecture 4

W vs. QCD Jet Tagging at the Large Hadron Collider

Expectation Maximization

Transcription:

Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction One approach to deal with high dimensional data is by reducing their dimensionality. Project high dimensional data onto a lower dimensional sub-space using linear or non-linear transformations. 2

Dimensionality Reduction Linear transformations are simple to compute and tractable. t i i i Y U X ( b u a ) k x 1 k x d d x 1 (k<<d) Classical linear- approaches: Principal Component Analysis (PCA) Fisher Discriminant Analysis (FDA) 3

Each dimensionality reduction technique finds an appropriate transformation by satisfying certain criteria (e.g., information loss, data discrimination, etc.) The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the dataset. 4

Find a basis in a low dimensional sub-space: Approximate vectors by projecting them in a low dimensional sub-space: (1) Original space representation: 1 2 x a v a v... a v 1 1 2 2 where v, v,..., v is a basein the original N-dimensionalspace n N N (2) Lower-dimensional sub-space representation: 1 2 xˆ b u b u... b u 1 1 2 2 where u, u,..., u is a basein the K-dimensionalsub-space(K<N) K K K Note: if K=N, then ˆx x 5

Example (K=N): 6

Information loss Dimensionality reduction implies information loss!! PCA preserves as much information as possible: min x xˆ (reconstruction error) What is the best lower dimensional sub-space? The best low-dimensional space is centered at the sample mean and has directions determined by the best eigenvectors of the covariance matrix of the data x. By best eigenvectors we mean those corresponding to the largest eigenvalues ( i.e., principal components ). Since the covariance matrix is real and symmetric, these eigenvectors are orthogonal and form a set of basis vectors. 7

Methodology Suppose x 1, x 2,..., x M are N x 1 vectors 8

Methodology cont. T b u ( x x) i i 9

Eigenvalue spectrum K λ i λ N 10

Linear transformation implied by PCA The linear transformation R N R K that performs the dimensionality reduction is: 11

Geometric interpretation PCA projects the data along the directions where the data varies the most. These directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues. The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector directions. 12

How many principal components to keep? To choose K, you can use the following criterion: 13

What is the error due to dimensionality reduction? It can be shown that the average error due to dimensionality reduction is equal to: e 14

Standardization The principal components are dependent on the units used to measure the original variables as well as on the range of values they assume. We should always standardize the data prior to using PCA. A common standardization method is to transform all the data to have zero mean and unit standard deviation: 15

Case Study: Eigenfaces for Face Detection/Recognition M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991. Face Recognition The simplest approach is to think of it as a template matching problem Problems arise when performing recognition in a highdimensional space. Significant improvements can be achieved by first mapping the data into a lower dimensionality space. How to find this lower-dimensional space? 16

Main idea behind eigenfaces 17

Computation of the eigenfaces 18

Computation of the eigenfaces cont. 19

Computation of the eigenfaces cont. u i 20

Computation of the eigenfaces cont. 21

Representing faces onto this basis 22

Eigenvalue spectrum M K λ i λ N 23

Representing faces onto this basis cont. 24

Face Recognition Using Eigenfaces 25

Face Recognition Using Eigenfaces cont. The distance e r is called distance within the face space (difs) Comment: we can use the common Euclidean distance to compute e r, however, it has been reported that the Mahalanobis distance performs better: 26

Face Detection Using Eigenfaces 27

Face Detection Using Eigenfaces cont. 28

Reconstruction of faces and non-faces 29

Applications Face detection, tracking, and recognition dffs 30

Problems in face recognition: Background (de-emphasize the outside of the face e.g., by multiplying the input image by a 2D Gaussian window centered on the face) Lighting conditions (performance degrades with light changes) Scale (performance decreases quickly with changes to head size) multi-scale eigenspaces scale input image to multiple sizes Orientation (performance decreases but not as fast as with scale changes) plane rotations can be handled out-of-plane rotations are more difficult to handle 31

Dataset 16 subjects 3 orientations, 3 sizes 3 lighting conditions, 6 resolutions (512x512... 16x16) Total number of images: 2,592 32

Experiment 1 Used various sets of 16 images for training One image/person, taken under the same conditions Classify the rest images as one of the 16 individuals 7 eigenfaces were used No rejections (i.e., no threshold on difs) Performed a large number of experiments and averaged the results: 96% correct averaged over light variation 85% correct averaged over orientation variation 64% correct averaged over size variation 33

Experiment 2 Considered rejections (i.e., by thresholding difs) There is a tradeoff between correct recognition and rejections. Adjusting the threshold to achieve 100% recognition accuracy resulted in: 19% rejections while varying lighting 39% rejections while varying orientation 60% rejections while varying size Experiment 3 Reconstruction using partial information 34

PCA and classification PCA is not always an optimal dimensionality-reduction procedure for classification purposes. Multiple classes and PCA Suppose there are C classes in the training data. PCA is based on the sample covariance which characterizes the scatter of the entire data set, irrespective of class-membership. The projection axes chosen by PCA might not provide good discrimination power. 35

PCA and classification (cont d) 36

Linear Discriminant Analysis (LDA) What is the goal of LDA? Perform dimensionality reduction while preserving as much of the class discriminatory information as possible. Seeks to find directions along which the classes are best separated. Takes into consideration the scatter within-classes but also the scatter between-classes. More capable of distinguishing image variation due to identity from variation due to other sources such as illumination and expression. 37

Linear Discriminant Analysis (LDA) 38

Linear Discriminant Analysis (LDA) Notation (S b has at most rank C-1) C M i S ( x μ )( x μ ) w j i j i i 1 j 1 T (each sub-matrix has rank 1 or less, i.e., outer product of two vectors) 39

Linear Discriminant Analysis (LDA) Methodology projection matrix y U T x LDA computes a transformation that maximizes the between-class scatter while minimizing the within-class scatter: T U SbU Sb max max T U S U S w w products of eigenvalues! S b, S : scatter matrices of the projected data y w 40

Linear Discriminant Analysis (LDA) Linear transformation implied by LDA The LDA solution is given by the eigenvectors of the generalized eigenvector problem: The linear transformation is given by a matrix U whose columns are the eigenvectors of the above problem (i.e., called Fisherfaces). Important: Since S b has at most rank C-1, the max number of eigenvectors with non-zero eigenvalues is C-1 (i.e., max dimensionality of sub-space is C-1) 41

Linear Discriminant Analysis (LDA) Does S w -1 always exist? If S w is non-singular, we can obtain a conventional eigenvalue problem by writing: In practice, S w is often singular since the data are image vectors with large dimensionality while the size of the data set is much smaller (M << N ) 42

Linear Discriminant Analysis (LDA) Does S w -1 always exist? cont. To alleviate this problem, we can use PCA first: 1) PCA is first applied to the data set to reduce its dimensionality. 2) LDA is then applied to find the most discriminative directions: 43

Linear Discriminant Analysis (LDA) Case Study: Using Discriminant Eigenfeatures for Image Retrieval D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996. Content-based image retrieval The application being studied here is query-by-example image retrieval. The paper deals with the problem of selecting a good set of image features for content-based image retrieval. 44

Linear Discriminant Analysis (LDA) Assumptions "Well-framed" images are required as input for training and queryby-example test probes. Only a small variation in the size, position, and orientation of the objects in the images is allowed. 45

Linear Discriminant Analysis (LDA) Some terminology Most Expressive Features (MEF): the features (projections) obtained using PCA. Most Discriminating Features (MDF): the features (projections) obtained using LDA. Numerical problems When computing the eigenvalues/eigenvectors of S w -1 S B u k = k u k numerically, the computations can be unstable since S w -1 S B is not always symmetric. See paper for a way to find the eigenvalues/eigenvectors in a stable way. 46

Linear Discriminant Analysis (LDA) Factors unrelated to classification MEF vectors show the tendency of PCA to capture major variations in the training set such as lighting direction. MDF vectors discount those factors unrelated to classification. 47

Linear Discriminant Analysis (LDA) Clustering effect Methodology 1) Generate the set of MEFs/MDFs for each image in the training set. 2) Given an query image, compute its MEFs/MDFs using the same procedure. 3) Find the k closest neighbors for retrieval (e.g., using Euclidean distance). 48

Linear Discriminant Analysis (LDA) Experiments and results Face images A set of face images was used with 2 expressions, 3 lighting conditions. Testing was performed using a disjoint set of images: One image, randomly chosen, from each individual. 49

Linear Discriminant Analysis (LDA) 50

Linear Discriminant Analysis (LDA) Examples of correct search probes 51

Linear Discriminant Analysis (LDA) Example of a failed search probe 52

Linear Discriminant Analysis (LDA) Case Study: PCA versus LDA A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001. Is LDA always better than PCA? There has been a tendency in the computer vision community to prefer LDA over PCA. This is mainly because LDA deals directly with discrimination between classes while PCA does not pay attention to the underlying class structure. Main results of this study: (1) When the training set is small, PCA can outperform LDA. (2) When the number of samples is large and representative for each class, LDA outperforms PCA. 53

Linear Discriminant Analysis (LDA) Is LDA always better than PCA? cont. 54

Linear Discriminant Analysis (LDA) Is LDA always better than PCA? cont. LDA is not always better when training set is small 55

Linear Discriminant Analysis (LDA) Is LDA always better than PCA? cont. LDA outperforms PCA when training set is large 56

References Face Recognition Using Dimensionality Reduction 1. M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. 2. P. Belhumeur, J. Hespanha, and D. Kriegman (July 1997). "Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection". IEEE Transactions on pattern analysis and machine intelligence 19 (7): 711-720 3. A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001. 57