CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions in a feature space along which points have the greatest variance. 30 25 20 15 10 5 0 0 5 10 15 20 25 30 Wavelength 1 30 25 20 15 PC 1 PC 2 10 5 0 0 5 10 15 20 25 30 Wavelength 1

Wavelength 2 Wavelength 2 Principal Components First PC is the direction of maximum variance. Technically (and mathematically) it s from the origin, but we actually mean the mean. Subsequent PCs are orthogonal to previous PCs and describe maximum residual variance 30 25 20 15 10 5 0 0 5 10 15 20 25 30 Wavelength 1 30 25 20 15 PC 1 PC 2 10 5 0 0 5 10 15 20 25 30 Wavelength 1

2D example: Fitting a line y E( a, b, d) ( axi byi d) i E 0 2 ( ax i by i d) 0 d d ax by i 2 d n a b x

2D example: Fitting a line y 2 E [ a( xi x) b( yi y)] Bn i where B Substitute d = ax + by : x1x y1 y x2 x y2 y...... xn x yn y so minimize subject to n 2 Bn =1 axis of least inertia 2 gives d a n b x

Sound familiar???

Another algebraic interpretation Minimizing sum of squares of distances to the line is the same as maximizing the sum of squares of the projections on that line. Origin P x T P x (unit vector through origin)

Algebraic interpretation Trick: How is the sum of squares of projection lengths expressed in algebraic terms? L i n e x T P P P P t t t t 1 2 3 m B T Point 1 Point 2 Point 3 : Point m B L i n e x ( xp T ) i i 2

Algebraic interpretation Our goal: max( x T B T Bx), subject to x T x = 1 L i n e P P P P t t t t 1 2 3 m Point 1 Point 2 Point 3 : Point m L i n e i ( xp T ) i 2 x T B T B x

Algebraic interpretation max( x T B T Bx), subject to x T x = 1 maximize E x T Mx subject to x T x 1 ( M B T B) T T E x Mx (1 x x) E x E x 2Mx 2x T 0 Mx x ( xis an eigenvector of B B)

Yet another algebraic interpretation T BB n n 2 x i xi yi i1 i1 n n 2 xi y i yi i1 i1 if about orign i So the principal components are the orthogonal directions of the covariance matrix of a set points.

Yet one more algebraic interpretation T B B x x T if about origin B T B ( x x)( xx) T otherwise outer product So the principal components are the orthogonal directions of the covariance matrix of a set points.

Eigenvectors How many eigenvectors are there? For Real Symmetric Matrices of size NxN Except in degenerate cases when eigenvalues repeat, there are N distinct eigenvectors

PCA: Eigenvalues 5 λ 1 λ 2 4 3 2 4.0 4.5 5.0 5.5 6.0

Dimensionality Reduction G v 2 x v 1 We can represent the orange points with only their v 1 coordinates x R

Higher Dimensions Suppose each data point is N-dimensional T var( v) ( x - x) v x T v A v w herea ( x x) ( x x) Outer product The ev with largest eigenvalue λ captures the most variation among training vectors x eigenvector with smallest eigenvalue has least variation T

The space of all face images When viewed as vectors of pixel values, face images are extremely high-dimensional 100x100 image = 10,000 dimensions

The space of all face images However, relatively few 10,000-dimensional vectors correspond to valid face images We want to effectively model the subspace of face images

The space of all face images We want to construct a low-dimensional linear subspace that best explains the variation in the set of face images

Principal Component Analysis Given: M data points x 1,, x M in R d where d is big We want some directions in R d that capture most of the variation of the x i. The coefficients would be: u x i = u T x i μ (μ: mean of data points) What unit vector u in R d captures the most variance of the data?

Principal Component Analysis Direction that maximizes the variance of the projected data: 1 M T T T var( u) u ( x )( u ( x )) i i i1 M Projection of data point 1 N ( )( ) T T u x x i i i1 M u T u Covariance matrix of data Σ u

Principal Component Analysis Direction that maximizes the variance of the projected data: T var( u) u u Since u is a unit vector that can be expressed in terms of some linear sum of the eigenvectors of Σ, then direction of u that maximizes the variance is the eigenvector associated with the largest eigenvalue of Σ Σ

Principal component analysis The direction that captures the maximum covariance of the data is the eigenvector corresponding to the largest eigenvalue of the data covariance matrix Furthermore, the top k orthogonal directions that capture the most variance of the data are the k eigenvectors corresponding to the k largest eigenvalues But first, we ll need the PCA d>>>n trick

The dimensionality trick Let Φ i be the (very big vector length d ) that is face image I with the mean image subtracted. Define C = 1 M Φ iφ i T = AA T where A = [Φ 1 Φ 2 Φ M ] is the matrix of faces, and is d M. Note: C is a huge d d matrix (remember d is the length of the vector of the image).

The dimensionality trick So C = AA T is a huge matrix. But consider A T A. It is only M M. Finding those eigenvalues is easy. Suppose v i is an eigenvector A T A: A T Av i = λv i Premultiply by A: AA T Av i = λav i So: Av i are the eigenvectors of C = AA T

How many eigenvectors are there? If had M > d then there would be d. But M d. So intuition would say there are M of them. But wait: if 2 points in 3D, how many eigenvectors? 3 points? Subtracting out the mean yields M-1.

Eigenfaces: Key idea (Turk and Pentland, 1991) Assume that most face images lie on a low-dimensional subspace determined by the first k (k <<< d) directions of maximum variance Use PCA to determine the vectors or eigenfaces u 1, u 2, u k that span that subspace Represent all face images in the dataset as linear combinations of eigenfaces. Find the coefficients by dot product.

Eigenfaces example Training images x 1,, x M

Mean: μ Top eigenvectors: u 1, u k

Eigenfaces example Principal component (eigenvector) u k

Eigenfaces example Principal component (eigenvector) u k μ + 3σ k u k μ 3σ k u k

Eigenfaces example Face x in face space coordinates (dot products) : T T x [ u ( x ),..., u ( x )] 1 k [ w,... w ] 1 k This vector is the representation of the face.

Eigenfaces example Reconstruction: = + x = µ + w 1 u 1 + w 2 u 2 + w 3 u 3 + w 4 u 4 +

Eigenfaces example But what about recognition?

Recognition with eigenfaces Given novel image x: Project onto subspace: w 1,, w k = [u 1 T x µ,, u k T x µ ] Optional: check reconstruction error x x to determine whether image is really a face Classify as closest training face in k-dimensional subspace This is why it s a generative model.

And old cast of characters

Limitations PCA of global structure Global appearance method: not robust to misalignment, background variation

Limitations PCA in general The direction of maximum variance is not always good for classification