Linear Subspace Models

Similar documents
Example: Face Detection

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

1 Singular Value Decomposition and Principal Component

15 Singular Value Decomposition

CS 4495 Computer Vision Principle Component Analysis

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis. PCA and Eigenfaces

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

A Unified Bayesian Framework for Face Recognition

14 Singular Value Decomposition

Main matrix factorizations

CITS 4402 Computer Vision

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

PCA FACE RECOGNITION

Lecture: Face Recognition

The Mathematics of Facial Recognition

Deriving Principal Component Analysis (PCA)

Principal Component Analysis

Reconnaissance d objetsd et vision artificielle

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Face detection and recognition. Detection Recognition Sally

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Covariance and Principal Components

Expectation Maximization

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Subspace Methods for Visual Learning and Recognition

What is Principal Component Analysis?

Lecture 13 Visual recognition

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Tutorial on Principal Component Analysis

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Principal Component Analysis

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Mathematical foundations - linear algebra

Eigenfaces. Face Recognition Using Principal Components Analysis

Lecture 17: Face Recogni2on

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Introduction to Machine Learning

Numerical Methods I Singular Value Decomposition

Advanced Introduction to Machine Learning CMU-10715

Dimensionality reduction

Pattern Recognition 2

Eigenface-based facial recognition

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Principal Component Analysis

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Linear Methods in Data Mining

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Multilinear Subspace Analysis of Image Ensembles

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Face Detection and Recognition

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Background Mathematics (2/2) 1. David Barber

Covariance-Based PCA for Multi-Size Data

Eigenimaging for Facial Recognition

PCA, Kernel PCA, ICA

Principal components analysis COMS 4771

7 Principal Component Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

COS 429: COMPUTER VISON Face Recognition

Lecture 17: Face Recogni2on

Multivariate Statistical Analysis

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Probabilistic Latent Semantic Analysis

Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis

1 Principal Components Analysis

L26: Advanced dimensionality reduction

Dimensionality Reduction

Colour. The visible spectrum of light corresponds wavelengths roughly from 400 to 700 nm.

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Foundations of Computer Vision

CS 340 Lec. 6: Linear Dimensionality Reduction

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Digital Image Processing Lectures 13 & 14

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Eigenvalues and Eigenvectors

EECS 275 Matrix Computation

Enhanced Fisher Linear Discriminant Models for Face Recognition

Principal Component Analysis (PCA)

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Announcements (repeat) Principal Components Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Singular Value Decomposition and Principal Component Analysis (PCA) I

Transcription:

Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images, for example. We consider the construction of low-dimensional bases for an ensemble of training data using principal components analysis (PCA). We introduce PCA, its derivation, its properties, and some of its uses. We very briefly critique its suitability for object detection. Readings: Sections 22.1 22.3 of the Forsyth and Ponce. Matlab Tutorials: colourtutorial.m, traineigeneyes.m and detecteigeneyes.m 2503: Linear Subspace Models c A.D. Jepson & D.J. Fleet, 2009 Page: 1

Representing Images of Human Eyes Question: Suppose we have a dataset of scaled, centered images of human eyes. How can we find an efficient representation of such a data set? Left Eyes Right Eyes Generative Model. Suppose we can approximate each image in the data set with a parameterized model of the form, Here a is a vector of coefficients. Possible uses: I( x) g( x; a). compress or reduce the dimension of the data set, generate novel instances, (possibly) recognition. 2503: Linear Subspace Models Page: 2

Subspace Appearance Models Idea: Images are not random, especially those of an object, or similar objects, under different viewing conditions. Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace. For example, let s represent each N N image as a point in an N 2 - dim vector space (e.g., ordering the pixels lexicographically to form the vectors). (red points denote images, blue vectors denote image differences) How do we find a low-dimensional basis to accurately model (approximate) each image of the training ensemble (as a linear combination of basis images)? 2503: Linear Subspace Models Page: 3

Linear Subspace Models We seek a linear basis with which each image in the ensemble is approximated as a linear combination of basis images b k ( x) K I( x) m( x) + a k b k ( x), (1) here m( x ) is the mean of the image ensemble. The subspace coefficients a=(a 1,..., a K ) comprise the representaion. With some abuse of notation, assuming basis images b k ( x) with N 2 pixels, let s define bk an N 2 1 vector with pixels arranged in lexicographic order B a matrix with columns b k, i.e., B = [ b 1,..., b K ] R N2 K With this notation we can rewrite Eq. (1) in matrix algebra as I m + B a (2) In what follows, we assume that the mean of the ensemble is 0. (Otherwise, if the ensemble we have is not mean zero, we can estimate the mean and subtract it from each image.) k=1 2503: Linear Subspace Models Page: 4

Choosing The Basis Orthogonality: Let s assume orthogonal basis functions, b k 2 = 1, bj T bk = δ jk. Subspace Coefficients: It follows from the linear model in Eq. (2) and the orthogonality of the basis functions that bk T I bk T B a = bk T [ b1,..., b K ] a = a k This selection of coefficients, a = B T I, minimizes the sum of squared errors (or sum of squared pixel differences, SSD): Basis Images: min a R K I B a 2 2 In order to select the basis functions { b k } K k=1, suppose we have a training set of images { I l } L l=1, with L K Recall we are assuming the images are mean zero. Finally, let s select the basis, { b k } K k=1, to minimize squared reconstruction error: L l=1 min a l I l B a l 2 2 2503: Linear Subspace Models Page: 5

Intuitions Example: Let s consider a set of images { I l } L l=1, each with only two pixels. So, each image can be viewed as a 2D point, I l R 2....................... For a model with only one basis image, what should b 1 be? Approach: Fit an ellipse to the distribution of the image data, and choose b 1 to be a unit vector in the direction of the major axis. Define the ellipse as x T C 1 x = 1, where C is the sample covariance matrix of the image data, C = 1 L L T Il Il l=1 The eigenvectors of C provide the major axis, i.e., CU = UD for orthogonal matrix U = [ u 1, u 2 ], and diagonal matrix D with elements d 1 d 2 0. The direction u 1 associated with the largest eigenvalue is the direction of the major axis, so let b 1 = u 1. 2503: Linear Subspace Models Page: 6

Principal Components Analysis Theorem: (Minimum reconstruction error) The orthogonal basis B, of rank K < N 2, that minimizes the squared reconstruction error over training data, { I l } L l=1, i.e., L l=1 min a l I l B a l 2 2 is given by the first K eigenvectors of the data covariance matrix C = 1 L L T Il Il R N 2 N 2, for which l=1 CU = UD where U = [ u 1,..., u N 2] is orthogonal, and D = diag(d 1,..., d N 2) with d 1 d 2... d N 2. That is, the optimal basis vectors are b k = u k, for k = 1...K. The corresponding basis images {b k ( x)} K k=1 Proof: see the derivation below. are often called eigen-images. 2503: Linear Subspace Models Page: 7

Derivation of PCA To begin, we want to find B in order to minimize squared error in subspace approximations to the images of the training ensemble. E = L l=1 min a l I l B a l 2 2 Given the assumption that the columns of B are orthonormal, the optimal coefficients are a l = B T Il, so E = L l=1 min a l I l B a l 2 2 = I l BB T Il 2 2 (3) Furthermore, writing the each training image as a column in a matrix A = [ I1,..., I L ], we have E = L I l BB T Il 2 2 = A BB T A 2 F = trace [ AA ] T trace [ B T AA T B ] l=1 You get this last step by expanding the square and noting B T B = I K, and using the properties of trace, e.g., trace[a] = trace[a T ], and also trace[b T AA T B] = trace[a T BB T A]. So to minmize the average squared error in the approximation we want to find B to maximize E = trace [ B T AA T B ] (4) Now, let s use the fact that for the data covariance, C we have C = 1 L AAT. Moreover, as defined above the SVD of C can be written as C = UDU T. So, let s substitute the SVD into E : E = trace [ B T UDU T B ] (5) where of course U is orthogonal, and D is diagonal. Now we just have to show that we want to choose B such that the trace strips off the first K elements of D to maximize E. Intuitively, note that B T U must be rank K since B is rank K. And note that the diagonal elements of D are ordered. Also the trace is invariant under matrix rotation. So, the highest rank K trace we can hope to get is by choosing B so that, when combined with U we keep the first K columns of D. That is, the columns of B should be the first K orthonormal rows of U. We need to make this a little more rigorous, but that s it for now... 2503: Linear Subspace Models Notes: 8

Maximum Variance: Other Properties of PCA The K-D subspace approximation captures the greatest possible variance in the training data. For a 1 = b T 1 I, the direction b 1 that maximizes the variance E[a 2 1] = b1 T C b1, subject to b T 1 b 1 = 1, is the first eigenvector of C. The second maximizes b T 2 C b 2 subject to b T 2 b 2 =1 and b T 1 b 2 =0. For a k = b k T I, and a = (a1,..., a K ), the subspace coefficient covariance is E[ a a T ] = diag(d 1,..., d K ). That is, the diagonal entries of D are marginal variances of the subspace coefficients: σ 2 k E[a2 k ] = d k. So the total variance captured in the subspace is sum of first K eigenvalues of C. Total variance lost owing to the subspace projection (i.e., the outof-subspace variance) is the sum of the last N 2 K eigenvalues: 1 L L l=1 [ ] min I l B a l 2 2 a l = N 2 k=k+1 σ 2 k Decorrelated Coefficients: C is diagonalized by its eigenvectors, so D is diagonal, and the subspace coefficients are uncorrelated. Under a Gaussian model of the images (where the images are drawn from an N 2 -dimensional Gaussian pdf), this means that the coefficients are also statistically independent. 2503: Linear Subspace Models Page: 9

PCA and Singular Value Decomposition The singular value decomposition of the data matrix A, [ A = I1,..., ] I L, A R N2 L, where usually L N 2. is given by A = USV T where U R N2 L, S R L L, V R L L. The columns of U and V are orthogonal, i.e., U T U = I L L and V T V = I L L, and matrix S is diagonal, S = diag(s 1,..., s L ) where s 1 s 2... s L 0. Theorem: The best rank-k approximation to A under the Frobenius norm, Ã, is given by à = K s k u k v T k = BB T A, where k=1 min rank(ã)=k N2 A à 2 F = k=k+1 s 2 k, and B = [ u 1,..., u K ]. à is also the best rank-k approximation under the L 2 matrix norm. What s the relation to PCA and the covariance of the training images? C = 1 L L l=1 Il Il T = 1 L AAT = 1 L USVT VS T U T = 1 L US2 U T So the squared singular values of A are proportional to the first L eigenvalues of C: d k = { 1 L s2 k for k = 1,..., L 0 for k > L And the singular vectors of A are just the first L eigenvectors of C. 2503: Linear Subspace Models Notes: 10

Eigen-Images for Generic Images? Fourier components are eigenfunctions of generic image ensembles. Why? Covariance matrices for stationary processes are Toeplitz. PCA yields unique eigen-images up to rotations of invariant subspaces (e.g., Fourier components with the same marginal variance). 2503: Linear Subspace Models Page: 11

Eigen-Reflectances Consider an ensemble of surface reflectances r(λ). Various Munsell Reflectances 1 0.8 Reflectivity 0.6 0.4 0.2 0 400 500 600 700 Wavelength (nm) What is the effective dimension of these reflectances? Define V k k j=1 σ2 j. Then the fraction of total variance explained by the first k PCA components is Q k V k /V L. Proportion of total variance 1 0.95 0.9 0.85 0.8 Fraction of Explained Variance 0.75 0 2 4 6 8 10 Principal subspace dimension, k Reflectances r(λ), for wavelengths λ within the visible spectrum, are effectively 3 dimensional (see colourtutorial.m). 2503: Linear Subspace Models Page: 12

Eye Subspace Model Subset of 1196 eye images (25 20): Left Eyes Right Eyes Defn: Let V k k j=1 s2 j, dq k s 2 k /V L, and Q k V k /V L : 0.2 dq(k): Variance Fraction Explained by one s.v. 0.8 Variance Fraction Explained by Subspace 0.18 0.7 0.16 Fraction of Variance, dq 0.14 0.12 0.1 0.08 0.06 Fraction of Variance 0.6 0.5 0.4 0.3 0.04 0.02 0.2 0 0 5 10 15 20 Singular value index, k 0.1 0 5 10 15 20 Singular value index Left plot shows dq k, the fraction of the total variance contributed by the k th principal component. Right plot shows Q k the fraction of the total variance captured by the subspace formed from the first k principal components. 2503: Linear Subspace Models Page: 13

Eye Subspace Model Mean Eye: Basis Images (1 6, and 10:5:35): Reconstructions (for K = 5, 20, 50): Eye Image Reconstruction (K = 5) Reconstruction (K = 20) Reconstruction (K = 50) Eye Image Reconstruction (K = 5) Reconstruction (K = 20) Reconstruction (K = 50) 2503: Linear Subspace Models Page: 14

Generative Eye Model Generative model, M, for random eye images: ( K ) I = m + a k bk + e k=1 where m is the mean eye image, a k N(0, σk 2), σ2 k is the sample variance associated with the k th principal direction in the training data, and e N(0, σe 2 I N 2) where σe 2 = 1 N 2 N 2 k=k+1 σ2 k is the per pixel out-of-subspace variance. Random Eye Images: Random draws from generative model (with K = 5, 10, 20, 50, 100, 200) So the probability of an image given this model M is ( K ) p( I M) = p(a k M) p( e M) where p(a k M) = k=1 1 2πσk e a2 k 2σ 2 k, p( e M) = N 2 j=1 1 2πσe e e2 j 2σ 2 e. 2503: Linear Subspace Models Page: 15

Face Detection The wide-spread use of PCA for object recognition began with the work Turk and Pentland (1991) for face detection and recognition. Shown below is the model learned from a collection of frontal faces, normalized for contrast, scale, and orientation, with the backgrounds removed prior to PCA. Here are the mean image (upper-left) and the first 15 eigen-images. The first three show strong variations caused by illumination. The next few appear to correspond to the occurrence of certain features (hair, hairline, beard, clothing, etc). 2503: Linear Subspace Models Page: 16

Object Recognition Murase and Nayar (1995) images of multiple objects, taken from different positions on the viewsphere each object occupies a manifold in the subspace (as a function of position on the viewsphere) recognition: nearest neighbour assuming dense sampling of object pose variations in the training set. 2503: Linear Subspace Models Page: 17

The generative model: Summary PCA finds the subspace (of a specified dimension) that maximizes projected signal variance. A single Gaussian model is naturally associated with a PCA representation. The principal axes are the principal directions of the Gaussian s covariance. Issues: The single Gaussian model is often rather crude. PCA coeff s can exhibit significantly more structure (cf. Murase & Nayar). As a result of this unmodelled structure, detectors based on single Gaussian models are often poor. See the Matlab tutorial detecteigeneyes.m. We discuss alternative detection strategies later in this course. 2503: Linear Subspace Models Page: 18

Further Readings Belhumeur et al (1997) Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. PAMI, 19(7):711-720 Chennubhotla, C. and Jepson, A.D. (2001) Sparse PCA (S-PCA): Extracting multi-scale structure from data. Proc. IEEE ICCV, Vancouver, pp. 641-647. T. Cootes, G. Edwards, and C.J. Taylor, Active Appearance Models, Proc. IEEE ECCV, 1998. Golub, G. and van Loan, C. (1984) Matrix Computations. Johns Hopkins Press, Baltimore. Murase, H. and Nayar, S. (1995) Visual learning and recognition of 3D objects from appearance. Int. J. Computer Vision 14:5 24. M. Black and A. Jepson (1998) EigenTracking: Robust matching and tracking of articulated objects using a view-based representation. Int. J. Computer Vision 26(1):63-84. Moghaddam, B., Jebara, T. and Pentland, A. (2000) Bayesian face recognition. Pattern Recognition, 33(11):1771-1782 Turk, M. and Pendland, A. (1991) Face recognition using eigenfaces, J. Cognitive Neuroscience, 3(1):71 86. P. Viola, and M.J. Jones, Robust real-time face detection, Int. J. of Computer Vision, 2004. 2503: Linear Subspace Models Notes: 19