COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

Similar documents
Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Eigenfaces. Face Recognition Using Principal Components Analysis

Face Recognition. Lecture-14

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Lecture: Face Recognition

PCA FACE RECOGNITION

Statistical Pattern Recognition

Face Detection and Recognition

Face Recognition. Lecture-14

Introduction to Machine Learning

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Estimation of Human Emotions Using Thermal Facial Information

Face detection and recognition. Detection Recognition Sally

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

CITS 4402 Computer Vision

ECE 661: Homework 10 Fall 2014

Announcements (repeat) Principal Components Analysis

Principal Component Analysis

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

COS 429: COMPUTER VISON Face Recognition

What is Principal Component Analysis?

Example: Face Detection

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture 13 Visual recognition

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

PRINCIPAL COMPONENTS ANALYSIS

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

1 Principal Components Analysis

Principal Component Analysis

L11: Pattern recognition principles

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Principal Component Analysis (PCA)

Subspace Methods for Visual Learning and Recognition

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Linear Subspace Models

Advanced Introduction to Machine Learning CMU-10715

Principal Component Analysis

Image Analysis. PCA and Eigenfaces

Lecture: Face Recognition and Feature Reduction

Least-Squares Spectral Analysis Theory Summary

Computation. For QDA we need to calculate: Lets first consider the case that

Face recognition Computer Vision Spring 2018, Lecture 21

PCA Review. CS 510 February 25 th, 2013

Principal Component Analysis (PCA)

14 Singular Value Decomposition

IV. Matrix Approximation using Least-Squares

Notes on Implementation of Component Analysis Techniques

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

Enhanced Fisher Linear Discriminant Models for Face Recognition

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Linear Algebra & Geometry why is linear algebra useful in computer vision?

EECS 275 Matrix Computation

Lecture: Face Recognition and Feature Reduction

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Principal Component Analysis (PCA) CSC411/2515 Tutorial

STA 414/2104: Lecture 8

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

7. Variable extraction and dimensionality reduction

Signal Analysis. Principal Component Analysis

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Table of Contents. Multivariate methods. Introduction II. Introduction I

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Face Recognition and Biometric Systems

Face Recognition. Lauren Barker

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Principal Component Analysis

Lecture 12. Local Feature Detection. Matching with Invariant Features. Why extract features? Why extract features? Why extract features?

Least Squares Optimization

Reconnaissance d objetsd et vision artificielle

A Brief Survey on Semi-supervised Learning with Graph Regularization

Pattern Recognition 2

Lecture 7: Con3nuous Latent Variable Models

PCA and LDA. Man-Wai MAK

Least Squares Optimization

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

LECTURE NOTE #10 PROF. ALAN YUILLE

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Deriving Principal Component Analysis (PCA)

A Unified Bayesian Framework for Face Recognition

Local Features (contd.)

Exercises * on Principal Component Analysis

PCA, Kernel PCA, ICA

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1

Singular Value Decomposition

Least Squares Optimization

Machine Learning 11. week

Invariant local features. Invariant Local Features. Classes of transformations. (Good) invariant local features. Case study: panorama stitching

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Regularized Discriminant Analysis and Reduced-Rank LDA

Machine Learning (Spring 2012) Principal Component Analysis

Face Recognition Using Eigenfaces

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Transcription:

COMP 408/508 Computer Vision Fall 07 PCA or Recognition

Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, ) x x x D é R x R x G G x x ë B x B x ù û

Principal Component Analysis (PCA) Let,,..., K be K realizations o the random vector o dimension N. PCA inds a set o M orthonormal vectors v, v,..., v M that best describes in the mean square error sense (M < N). Problem is equivalent to minimization o the mean square error ξ as ollows: x K å i - K i where ˆi M m ˆ i ( v m ) i v m E{ - i ˆ i } E: Expected value

Principal Component Analysis (PCA) Let,,..., K be K realizations o the random vector o dimension N. PCA inds a set o M orthonormal vectors v, v,..., v M that best describes in the mean square error sense (M < N). Problem is equivalent to minimization o the mean square error ξ as ollows: å - K i i i ˆ K x where ˆi v m i ( ) m M v m his reduces to inding the M eigenvectors associated with the largest eigenvalues o the N N correlation or covariance matrix C: F F C K û ù ë é K F! where or û ù ë é - - - µ µ µ F K! å K k k K E } { µ } ˆ { i i E - E: Expected value

Geometrical interpretation PCA projects the data along the directions where the data varies the most. hese directions are determined by the eigenvectors o the covariance matrix corresponding to the largest eigenvalues. he magnitude o the eigenvalues is proportional to the variance o the data along the eigenvector directions. v x v x

Covariance or Correlation? With correlation matrix: û ù ë é K F! û ù ë é - - - µ µ µ F K! With covariance matrix: Describes the variation Describes the energy F F C K F F C K

Recursive Interpretation o PCA Find the direction o the irst principal component by v arg max E{( v v hus the irst principal component is the projection onto the direction along which the energy (or the variance) o the projection is maximized. Having determined the irst m principal components, the m-th principal component is determined as the principal component o the residual: - vm arg max E {[( -å( vn ) v v n m ) } n ) v ] }

Where to use PCA? Everywhere! PCA is and can be used in various disciplines Dimension reduction, decorrelation, eicient and meaningul data representation, optimization In computer vision: Recognition, retrieval, compression, D or 3D eature representation and detection, itting, etc.

PCA or Dimension Reduction Suppose we have a high dimensional eature vector x that represents the data. Signiicant improvements can be achieved by irst mapping the data into a lower-dimensionality space, both or recognition and computational aspects. Curse o dimensionality is an important problem. he goal o PCA is to reduce the dimensionality o the data while retaining as much as possible o the variation present in the original dataset: N x å x u : high dimension xˆ n M åb m n m Dimensionality reduction implies inormation loss! Preserve as much inormation as possible, that is: v n m :low dimension M << minimize E( x ˆx ) : expected error N

PCA Methodology he best low-dimensional space can be determined by the "best" eigenvectors o the covariance matrix o x (i.e., the eigenvectors corresponding to the "largest" eigenvalues, also called "principal components"). Suppose we have K realizations x, x,..., x K :. Compute the symmetric N N covariance matrix C.. Compute eigenvalues: 3. Compute eigenvectors: l ³ l ³! ³ 4. Reduce the dimension keeping only the terms corresponding to M largest eigenvalues. M xˆ - µ åbm vm : M << N m 5. Form the new eature b vector by b v,v,,v N [ b b! ] b ( x - µ ) v b M l N

Properties o PCA PCA decorrelates data; the new eatures b i s are uncorrelated since their covariance matrix is diagonalized by the PCA transorm. he covariance matrix represents only second order statistics among the vector values. Hence there may still remain higher order correlations ater PCA transormation. [ ] where 0 0 0 0 0 0 M M v v v V CV V! " # # #! û ù ë é l l l

Standardization he principal components are dependent on the units used to measure the original variables as well as on the range o values they assume. We may need to standardize the data prior to using PCA. A common standardization method is to transorm all the data to have zero mean and unit standard deviation: x n - µ s How to choose M? M å l m N å l n m n > threshold (e.g. ~ 0.9)

Example to PCA: Eigenaces or Face Recognition Eigenace technique was proposed by M. urk and A. Pentland, 99. Eigenaces correspond to the eigenvectors o the covariance matrix o a set o ace images. Every image can be expressed as a linear combination o eigenaces. Eigenace coeicients constitute the reduced low-dimensional eature vector. An appearance-based (i.e., intensity-based) technique; works on gray scale images. Face images must be centered and o the same size.? Face detection and normalization may be necessary. Note that pose and lighting are dierent!

Computation o Eigenaces ake a set o N N training ace images, I, I,..., I K. Express each as a vector o size N : x, x,..., x K. Compute the N N covariance matrix C: C é x - µ ù X X, X! N N is too large! K ë xk - µ û Instead use K K XX matrix! he eigenvalues o XX are the same with the K largest eigenvalues o X X. he eigenvectors are computed as ollows: v k X w k, where v k and w k are the eigenvectors o X X and XX. Normalize each v k such that v k. Keep the largest M eigenvalues and the associated eigenvectors, i.e., eigenaces.

Representing Faces in Eigenspace Each normalized ace image vector x k in the training set can be approximated as a linear combination o the M eigenaces. xˆ - µ M å k b m m v m xˆ k - µ b b v v Hence each normalized ace image vector x k in the training set can be represented by an eigenace coeicient vector: b k [b...b M ].

Face Recognition using Eigenaces Given an unknown image vector x : Project the mean-normalized vector onto the eigenspace and represent it by the eigenace coeicient vector b. Find r arg min k b - b k I mindist< (that is, the distance to the closest coeicient vector is less than a threshold), then the ace is recognized as ace r rom the training set. Otherwise it is rejected. May be better to use scaled Euclidean (or Mahalanobis) distance: b b k M m λ m ( b m b ) mk

How to choose M? One alternative M å l m N å l n m n > threshold (e.g. ~ 0.9) Another alternative is to optimize M over test/validation data.

Problems o Eigenace technique Sensitive to rotation, scale and translation. Sensitive to lighting variations Background intererence hus ace images should be preprocessed to lessen the eects o possible variations. Variations such as lighting and rotation can also be taken into account during training. he training dataset may include samples with such variations.

Face detection Can be thought o as a -class recognition problem: Face or non-ace Dierent alternatives exist: OpenCV has a Haar eatures used in ace detection module. Exploits local eatures such as edges and line patterns. It scans a given image at dierent scales with a sliding window. Scale, translation and lighting invariant. However it is sensitive to rotation. Neural Networks, SVMs,.. Need lots o training samples (also or non-ace class) Eigenace detection (a simpler approach): For example, use approximation error due to dimension reduction as aceness measure.

Multi-scale Face detection with sliding window Sliding window size doesn t change across scales.

Face Detection using Eigenaces Given an unknown image vector x : Project the mean-normalized vector onto the eigenspace and represent it by the eigenace coeicient vector b. Compute reconstruction error: e x ˆx I e <, then the image is detected as a ace, otherwise nonace.

Face Detection using Eigenaces Another possibility is to train an SVM classiier with eigenace coeicients as eatures (see HW3).

PCA perormance or recognition PCA is not always an optimal dimensionality-reduction procedure or classiication purposes:

Linear Discriminant Analysis (LDA) he objective o LDA is to perorm dimensionality reduction while preserving as much o the class discriminatory inormation as possible. It seeks to ind directions along which the classes are best separated. It does so by taking into consideration the scatter within-classes as well as the scatter between-classes. It is also more capable o distinguishing variations due to class identity, e.g., rom variation due to other sources such as illumination and expression in ace recognition.

Methodology Suppose there are R classes. Let µ r be the mean eature vector or class r. Let K r be the number o training samples rom class r. Let K Σ K r be the total number o samples. Within-class scatter matrix: Between-class scatter matrix: S S b w R R r k å( µ - µ ) ( µ - r K r å å ( x - µ ) ( x - µ r k r r µ ) where µ LDA computes a transormation V that maximizes the between-class scatter while minimizing the within-class scatter: det ( V SV b ) maximize det ( V S V ) w k r ) R R å r Such a transormation retains class separability while reducing the variation due to sources other than class identity (e.g., illumination). µ r

LDA transormation (Fisheraces) he optimal linear tranormation is given by a matrix V whose columns are the eigenvectors o S - w S b(called Fisheraces in the case o ace recognition). [ b b! ] ( x - µ ) V b L V! "# v v v L $ %& Choose the eigenvectors with the largest L eigenvalues o S - w S b hese eigenvectors give the directions o maximum discrimination.

What to do? Use PCA. - PCA is irst applied to the data set to reduce its dimension. - LDA is then applied to urther reduce the dimension.. he matrix has at most R nonzero eigenvalues. hus the upper limit or the reduced LDA space is R.. he matrix does not always exist. o guarantee that is not singular, we need at least K N + R training samples, which is not always practical. Limitations o LDA - S w R K M, y y y x x x M N - û ù ë é û ù ë é PCA!! w S b S - S w LDA - û ù ë é û ù ë é R L, z z z y y y L M!!

Is LDA always better than PCA? No. When the number o training samples is large and representative or each class, LDA outperorms PCA. I not, better to use PCA! An example or class representative training samples. From Purdue AR database