Covariance and Principal Components

Similar documents
COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Introduction to Machine Learning

Principal Component Analysis

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Components Theory Notes

PCA, Kernel PCA, ICA

Face Recognition and Biometric Systems

Principal Component Analysis

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Linear Subspace Models

Mathematical foundations - linear algebra

Lecture: Face Recognition and Feature Reduction

Eigenvalues, Eigenvectors, and an Intro to PCA

Lecture: Face Recognition and Feature Reduction

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Component Analysis (PCA)

What is Principal Component Analysis?

PCA FACE RECOGNITION

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Principal Components Analysis (PCA)

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

GEOG 4110/5100 Advanced Remote Sensing Lecture 15

Linear Algebra & Geometry why is linear algebra useful in computer vision?

CS 4495 Computer Vision Principle Component Analysis

EECS490: Digital Image Processing. Lecture #26

Machine Learning - MT & 14. PCA and MDS

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Data Preprocessing Tasks

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Dimensionality Reduction

Image Analysis. PCA and Eigenfaces

PRINCIPAL COMPONENT ANALYSIS

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Eigenimaging for Facial Recognition

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Eigenvalues, Eigenvectors, and an Intro to PCA

Wavelet Transform And Principal Component Analysis Based Feature Extraction

Computer Assisted Image Analysis

Lecture 7: Con3nuous Latent Variable Models

Multivariate Statistical Analysis

Vectors and Matrices Statistics with Vectors and Matrices

Principal Component Analysis (PCA) CSC411/2515 Tutorial

PRINCIPAL COMPONENTS ANALYSIS

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face detection and recognition. Detection Recognition Sally

Linear Algebra for Machine Learning. Sargur N. Srihari

Deriving Principal Component Analysis (PCA)

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

20 Unsupervised Learning and Principal Components Analysis (PCA)

Neuroscience Introduction

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

The Mathematics of Facial Recognition

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Singular Value Decomposition and Principal Component Analysis (PCA) I

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Random Vectors, Random Matrices, and Matrix Expected Value

Principal Component Analysis (PCA) Theory, Practice, and Examples

Signal Analysis. Principal Component Analysis

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

CITS 4402 Computer Vision

Advanced Introduction to Machine Learning CMU-10715

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Principal Component Analysis and Linear Discriminant Analysis

Face Detection and Recognition

Eigenface-based facial recognition

CS281 Section 4: Factor Analysis and PCA

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

CSC 411 Lecture 12: Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Foundations of Computer Vision

Dimensionality reduction

Expectation Maximization

Principal Component Analysis

PCA and admixture models

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Methods for sparse analysis of high-dimensional data, II

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Probabilistic & Unsupervised Learning

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Announcements (repeat) Principal Components Analysis

Principal Component Analysis

Unsupervised Learning: Dimensionality Reduction

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

PCA Review. CS 510 February 25 th, 2013

Feature Extraction Techniques

Econ Slides from Lecture 7

Lecture 3: Review of Linear Algebra

The Principal Component Analysis

Lecture 3: Review of Linear Algebra

JUST THE MATHS UNIT NUMBER 9.9. MATRICES 9 (Modal & spectral matrices) A.J.Hobson

Transcription:

COMP3204/COMP6223: Computer Vision Covariance and Principal Components Jonathon Hare jsh2@ecs.soton.ac.uk

Variance and Covariance

Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of random variables and expected values A random variable is a variable that takes on different values due to chance. The set of sample values from a single dimension of a featurespace can be considered to be a random variable The expected value (denoted E[X]) is the most likely value a random variable will take. For this course we ll assume that the values an element of a feature can take are all equally likely The expected value is thus just the mean value

f'such'a'variable'is'just'its'mean'value data'points,' = [,,, ],'is't Variance :' 1 = ( ) ' 2) is the mean squared difference from Variance (σ ow' spreadsout 'the'data'is'from'the' the mean ("). wo'variables'(x'and'y)'change'togethe It s a measure of how spread-out the data is. 1 2] technically it s E[(X E[X]), = ( )( )'

ow' spreadsout 'the'data'is'from'the' Covariance wo'variables'(x'and'y)'change'togethe 1, = ( )( )' ance'when'the'two'variables'are'the' Covariance (σ(x,y)) measures how two variables change together ans'that'the'variables'are'uncorrelat technically it s E[(x - E[x])(y-E[y])] in'fact'related'to'correlation:'

ow' spreadsout 'the'data'is'from'the' Covariance wo'variables'(x'and'y)'change'togethe 1, = ( )( )' ance'when'the'two'variables'are'the' The variance is the covariance when the two variables are the same (σ(x,x)=σ2(x)) ans'that'the'variables'are'uncorrelat in'fact'related'to'correlation:'

ow' spreadsout 'the'data'is'from'the' Covariance wo'variables'(x'and'y)'change'togethe 1, = ( )( )' ance'when'the'two'variables'are'the' A covariance of 0 means the variables are uncorrelated ans'that'the'variables'are'uncorrelat (Covariance is related to Correlation see notes) in'fact'related'to'correlation:'

, =, ' e'that', =, ' matrix,,'encodes'how'all'possible'pairs'of'dimensions'in'a'nsd ts'in'a'feature'space),'x,'vary'together:',,,,,, = ',,, o'the'isth'element'of'all'the'vectors'in'the'feature'space.'' riance'matrix'is'a'symmetric'matrix' A covariance matrix encodes how all possible pairs a'set'of'vectors'is'the'process'of'subtracting'the'mean'(comput of dimensions in an n-dimensional dataset vary le]'of'the'vectors)'from'each'vector.' of'n'meanscentred'vectors,'you'can'form'them'into'a'matrix,'z together s'to'one'of'your'vectors.'the'covariance'matrix'then'proportio ultiplied'by'z:' Covariance Matrix

, =, ' e'that', =, ' matrix,,'encodes'how'all'possible'pairs'of'dimensions'in'a'nsd ts'in'a'feature'space),'x,'vary'together:',,,,,, = ',,, o'the'isth'element'of'all'the'vectors'in'the'feature'space.'' riance'matrix'is'a'symmetric'matrix' a'set'of'vectors'is'the'process'of'subtracting'the'mean'(comput The covariance matrix is a square symmetric le]'of'the'vectors)'from'each'vector.' matrix of'n'meanscentred'vectors,'you'can'form'them'into'a'matrix,'z s'to'one'of'your'vectors.'the'covariance'matrix'then'proportio ultiplied'by'z:' Covariance Matrix

Demo: 2D covariance

Mean Centring Mean Centring is the process of computing the mean (across each dimension independently) of a set of vectors, and then subtracting the mean vector from every vector in the set. All the vectors will be translated so their average position is the origin

Covariance matrix again Z = v v v v v v v v v v v v Each row is a mean centred featurevector Then Σ Z T Z

Principal axes of variation

Basis A basis is a set of n linearly independent vectors in an n dimensional space The vectors are orthogonal" They form a coordinate system There are an infinite number of possible basis

The first principal axis For a given set of n dimensional data, the first principle axis (or just principal axis) is the vector that describes the direction of greatest variance.

The second principal axis The second principal axis is a vector in the direction of greatest variance orthogonal (perpendicular) to the first major axis.

The third principal axis In a space with 3 or more dimensions, the third principal axis is the direction of greatest variance orthogonal to both the first and second principal axes. The forth and so on The set of n principal axes of an n dimensional space are a basis

Eigenvectors, Eigenvalues and Eigendecomposition

Eigenvectors and Eigenvalues Very important equation Av=λv

Eigenvectors and Eigenvalues a n n square matrix a scalar value, known as an eigenvalue Av=λv an n dimensional vector, known as an eigenvector

Eigenvectors and Eigenvalues Av=λv There are at most n eigenvector-eigenvalue pairs If A is symmetric, then the set of eigenvectors is orthogonal Can you see where this is going?

Eigenvectors and Eigenvalues Av=λv If A is a covariance matrix, then the eigenvectors are the principal axes. The eigenvalues are proportional to the variance of the data along each eigenvector. The eigenvector corresponding to the largest eigenvalue is the first P.C.

Finding the EVecs and EVals For small matrices (n 4) there are algebraic solutions to finding all the eigenvector-eigenvalue pairs For larger matrices, numerical solutions to the Eigendecomposition must be sought.

Eigendecomposition columns of Q are the eigenvectors A=QQ -1 diagonal eigenvalue matrix ( ii = " i )

Eigendecomposition A=QQ -1 If A is real symmetric (i.e. a covariance matrix), then Q -1 = Q T (i.e. eigenvectors are orthogonal), so: A=QQ T

Eigendecomposition In summary, the Eigendecomposition of a covariance matrix A: A=QQ T Gives you the principal axes and their relative magnitudes

Ordering Standard Eigendecomposition implementations will order the eigenvectors (columns of Q) such that the eigenvalues (in the diagonal of ) are sorted in order of decreasing value. Some solvers are optimised to only find the top k eigenvalues and corresponding eigenvectors, rather than all of them.

Demo: Covariance, Eigendecomposition and principal axes

Principal Component Analysis

Linear Transform A linear transform W projects data from one space into another: T = ZW" Original data stored in the rows of Z" T can have fewer dimensions than Z.

Linear Transforms The effects of a linear transform can be reversed if W is invertible: Z = TW -1 A lossy process if the dimensionality of the spaces is different

PCA PCA is an Orthogonal Linear Transform that maps data from its original space to a space defined by the principal axes of the data. The transform matrix W is just the eigenvector matrix Q from the Eigendecomposition of the covariance matrix of the data. Dimensionality reduction can be achieved by removing the eigenvectors with low eigenvalues from Q (i.e. keeping the first L columns of Q assuming the eigenvectors are sorted by decreasing eigenvalue).

PCA Algorithm 1. Mean'centre+the+data+vectors+ 2. Form+the+vectors+into+a+matrix+Z,+such+that+each+row+ corresponds+to+a+vector+ 3. Perform+the+Eigendecomposition+of+the+matrix+Z T Z,+to+ recover+the+eigenvector+matrix+q+and+diagonal+eigenvalue+ matrix+λ:+z T Z+=+QΛQ T + 4. Sort+the+columns+of+Q+and+corresponding+diagonal+values+of+ Λ+so+that+the+eigenvalues+are+decreasing.+ 5. Select+the+L+largest+eigenvectors+of+Q+(the+first+L+ columns)+to+create+the+transform+matrix+q L.+ 6. Project+the+original+vectors+into+a+lower+dimensional+ space,+t L :+T L +=+ZQ L

Demo: PCA

Eigenfaces (Eigenimages)

A simple kind of feature

but with some problems Not invariant to: Change in the position/orientation/scale of the object in the image Changes in lighting Size of the image Highly susceptible to image noise

Making it invariant Require (almost) the same object pose across images (i.e. full frontal faces) Align (rotate, scale and translate) the images so that the a common feature is in the same place (i.e. the eyes in a set of face images) Make all the aligned images the same size (optional) Normalise (or perhaps histogram equalise) the images so they are invariant to global intensity changes

but there is still a bit of a problem The featurevectors are huge If the images are 100x200 pixels, the vector has 20000 dimensions That s not really practical Also, the vectors are still highly susceptible to imaging noise and variations due to slight misalignments

Potential solution Apply PCA PCA can be used to reduce the dimensionality smaller number of dimensions allows greater robustness to noise and mis-alignment there are fewer degrees of freedom, so noise/misalignment has much less effect and the dominant features are captured Fewer dimensions makes applying machine-learning much more tractable.

Demo: a dataset of faces

Demo: mean-face

Demo: mean-centred faces

Demo: principal components

Demo: reconstructed faces

Demo: reconstruction from weights

Summary Covariance measures the shape of data by measuring how different dimensions change together. The principle axes are a basis, aligned such they describe most directions of greatest variance. The Eigendecomposition of the covariance matrix produces pairs of eigenvectors (corresponding to the principal axes) and eigenvalues (proportional to the variance along the respective axis). PCA aligns data with its principal axes, and allows dimensionally reduction by discounting axes with low variance. Eigenfaces applies PCA to vectors made from pixel values to make robust low-dimensional image descriptors.