COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

COMP 408/508 Computer Vision Fall 07 PCA or Recognition

Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, ) x x x D é R x R x G G x x ë B x B x ù û

Principal Component Analysis (PCA) Let,,..., K be K realizations o the random vector o dimension N. PCA inds a set o M orthonormal vectors v, v,..., v M that best describes in the mean square error sense (M < N). Problem is equivalent to minimization o the mean square error ξ as ollows: å - K i i i ˆ K x where ˆi v m i ( ) m M v m his reduces to inding the M eigenvectors associated with the largest eigenvalues o the N N correlation or covariance matrix C: F F C K û ù ë é K F! where or û ù ë é - - - µ µ µ F K! å K k k K E } { µ } ˆ { i i E - E: Expected value

Geometrical interpretation PCA projects the data along the directions where the data varies the most. hese directions are determined by the eigenvectors o the covariance matrix corresponding to the largest eigenvalues. he magnitude o the eigenvalues is proportional to the variance o the data along the eigenvector directions. v x v x

Covariance or Correlation? With correlation matrix: û ù ë é K F! û ù ë é - - - µ µ µ F K! With covariance matrix: Describes the variation Describes the energy F F C K F F C K

Recursive Interpretation o PCA Find the direction o the irst principal component by v arg max E{( v v hus the irst principal component is the projection onto the direction along which the energy (or the variance) o the projection is maximized. Having determined the irst m principal components, the m-th principal component is determined as the principal component o the residual: - vm arg max E {[( -å( vn ) v v n m ) } n ) v ] }

Where to use PCA? Everywhere! PCA is and can be used in various disciplines Dimension reduction, decorrelation, eicient and meaningul data representation, optimization In computer vision: Recognition, retrieval, compression, D or 3D eature representation and detection, itting, etc.

PCA or Dimension Reduction Suppose we have a high dimensional eature vector x that represents the data. Signiicant improvements can be achieved by irst mapping the data into a lower-dimensionality space, both or recognition and computational aspects. Curse o dimensionality is an important problem. he goal o PCA is to reduce the dimensionality o the data while retaining as much as possible o the variation present in the original dataset: N x å x u : high dimension xˆ n M åb m n m Dimensionality reduction implies inormation loss! Preserve as much inormation as possible, that is: v n m :low dimension M << minimize E( x ˆx ) : expected error N

PCA Methodology he best low-dimensional space can be determined by the "best" eigenvectors o the covariance matrix o x (i.e., the eigenvectors corresponding to the "largest" eigenvalues, also called "principal components"). Suppose we have K realizations x, x,..., x K :. Compute the symmetric N N covariance matrix C.. Compute eigenvalues: 3. Compute eigenvectors: l ³ l ³! ³ 4. Reduce the dimension keeping only the terms corresponding to M largest eigenvalues. M xˆ - µ åbm vm : M << N m 5. Form the new eature b vector by b v,v,,v N [ b b! ] b ( x - µ ) v b M l N

Properties o PCA PCA decorrelates data; the new eatures b i s are uncorrelated since their covariance matrix is diagonalized by the PCA transorm. he covariance matrix represents only second order statistics among the vector values. Hence there may still remain higher order correlations ater PCA transormation. [ ] where 0 0 0 0 0 0 M M v v v V CV V! " # # #! û ù ë é l l l

Standardization he principal components are dependent on the units used to measure the original variables as well as on the range o values they assume. We may need to standardize the data prior to using PCA. A common standardization method is to transorm all the data to have zero mean and unit standard deviation: x n - µ s How to choose M? M å l m N å l n m n > threshold (e.g. ~ 0.9)

Example to PCA: Eigenaces or Face Recognition Eigenace technique was proposed by M. urk and A. Pentland, 99. Eigenaces correspond to the eigenvectors o the covariance matrix o a set o ace images. Every image can be expressed as a linear combination o eigenaces. Eigenace coeicients constitute the reduced low-dimensional eature vector. An appearance-based (i.e., intensity-based) technique; works on gray scale images. Face images must be centered and o the same size.? Face detection and normalization may be necessary. Note that pose and lighting are dierent!

Computation o Eigenaces ake a set o N N training ace images, I, I,..., I K. Express each as a vector o size N : x, x,..., x K. Compute the N N covariance matrix C: C é x - µ ù X X, X! N N is too large! K ë xk - µ û Instead use K K XX matrix! he eigenvalues o XX are the same with the K largest eigenvalues o X X. he eigenvectors are computed as ollows: v k X w k, where v k and w k are the eigenvectors o X X and XX. Normalize each v k such that v k. Keep the largest M eigenvalues and the associated eigenvectors, i.e., eigenaces.

Representing Faces in Eigenspace Each normalized ace image vector x k in the training set can be approximated as a linear combination o the M eigenaces. xˆ - µ M å k b m m v m xˆ k - µ b b v v Hence each normalized ace image vector x k in the training set can be represented by an eigenace coeicient vector: b k [b...b M ].

Face Recognition using Eigenaces Given an unknown image vector x : Project the mean-normalized vector onto the eigenspace and represent it by the eigenace coeicient vector b. Find r arg min k b - b k I mindist< (that is, the distance to the closest coeicient vector is less than a threshold), then the ace is recognized as ace r rom the training set. Otherwise it is rejected. May be better to use scaled Euclidean (or Mahalanobis) distance: b b k M m λ m ( b m b ) mk

How to choose M? One alternative M å l m N å l n m n > threshold (e.g. ~ 0.9) Another alternative is to optimize M over test/validation data.

Problems o Eigenace technique Sensitive to rotation, scale and translation. Sensitive to lighting variations Background intererence hus ace images should be preprocessed to lessen the eects o possible variations. Variations such as lighting and rotation can also be taken into account during training. he training dataset may include samples with such variations.

Face detection Can be thought o as a -class recognition problem: Face or non-ace Dierent alternatives exist: OpenCV has a Haar eatures used in ace detection module. Exploits local eatures such as edges and line patterns. It scans a given image at dierent scales with a sliding window. Scale, translation and lighting invariant. However it is sensitive to rotation. Neural Networks, SVMs,.. Need lots o training samples (also or non-ace class) Eigenace detection (a simpler approach): For example, use approximation error due to dimension reduction as aceness measure.

Multi-scale Face detection with sliding window Sliding window size doesn t change across scales.

Face Detection using Eigenaces Given an unknown image vector x : Project the mean-normalized vector onto the eigenspace and represent it by the eigenace coeicient vector b. Compute reconstruction error: e x ˆx I e <, then the image is detected as a ace, otherwise nonace.

Face Detection using Eigenaces Another possibility is to train an SVM classiier with eigenace coeicients as eatures (see HW3).

PCA perormance or recognition PCA is not always an optimal dimensionality-reduction procedure or classiication purposes:

Linear Discriminant Analysis (LDA) he objective o LDA is to perorm dimensionality reduction while preserving as much o the class discriminatory inormation as possible. It seeks to ind directions along which the classes are best separated. It does so by taking into consideration the scatter within-classes as well as the scatter between-classes. It is also more capable o distinguishing variations due to class identity, e.g., rom variation due to other sources such as illumination and expression in ace recognition.

Methodology Suppose there are R classes. Let µ r be the mean eature vector or class r. Let K r be the number o training samples rom class r. Let K Σ K r be the total number o samples. Within-class scatter matrix: Between-class scatter matrix: S S b w R R r k å( µ - µ ) ( µ - r K r å å ( x - µ ) ( x - µ r k r r µ ) where µ LDA computes a transormation V that maximizes the between-class scatter while minimizing the within-class scatter: det ( V SV b ) maximize det ( V S V ) w k r ) R R å r Such a transormation retains class separability while reducing the variation due to sources other than class identity (e.g., illumination). µ r

LDA transormation (Fisheraces) he optimal linear tranormation is given by a matrix V whose columns are the eigenvectors o S - w S b(called Fisheraces in the case o ace recognition). [ b b! ] ( x - µ ) V b L V! "# v v v L $ %& Choose the eigenvectors with the largest L eigenvalues o S - w S b hese eigenvectors give the directions o maximum discrimination.

What to do? Use PCA. - PCA is irst applied to the data set to reduce its dimension. - LDA is then applied to urther reduce the dimension.. he matrix has at most R nonzero eigenvalues. hus the upper limit or the reduced LDA space is R.. he matrix does not always exist. o guarantee that is not singular, we need at least K N + R training samples, which is not always practical. Limitations o LDA - S w R K M, y y y x x x M N - û ù ë é û ù ë é PCA!! w S b S - S w LDA - û ù ë é û ù ë é R L, z z z y y y L M!!

Is LDA always better than PCA? No. When the number o training samples is large and representative or each class, LDA outperorms PCA. I not, better to use PCA! An example or class representative training samples. From Purdue AR database