Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Similar documents
Singular Value Decomposition

UNIT 6: The singular value decomposition.

Lecture: Face Recognition and Feature Reduction

Numerical Methods I Singular Value Decomposition

CS 143 Linear Algebra Review

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Positive Definite Matrix

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

COMP 558 lecture 18 Nov. 15, 2010

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Linear Algebra. Session 12

Linear Least Squares. Using SVD Decomposition.

Problem # Max points possible Actual score Total 120

The Singular Value Decomposition

Principal Component Analysis

Maths for Signals and Systems Linear Algebra in Engineering

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Linear Algebra Review. Fei-Fei Li

Lecture: Face Recognition and Feature Reduction

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

STA141C: Big Data & High Performance Statistical Computing

Multivariate Statistical Analysis

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Basic Calculus Review

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

1 Singular Value Decomposition and Principal Component

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Singular Value Decomposition

Large Scale Data Analysis Using Deep Learning

1 Inner Product and Orthogonality

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Computational math: Assignment 1

Maths for Signals and Systems Linear Algebra in Engineering

Singular Value Decomposition

Matrix decompositions

STA141C: Big Data & High Performance Statistical Computing

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Singular Value Decompsition

Example: Face Detection

Principal Component Analysis

Mathematical foundations - linear algebra

14 Singular Value Decomposition

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Principal Component Analysis (PCA)

Linear Algebra Review. Fei-Fei Li

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Chapter 3 Transformations

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Linear Algebra Review. Vectors

Machine Learning - MT & 14. PCA and MDS

Face Recognition and Biometric Systems

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

The Mathematics of Facial Recognition

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Chap 3. Linear Algebra

Functional Analysis Review

Eigenvalues, Eigenvectors, and an Intro to PCA

Methods for sparse analysis of high-dimensional data, II

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Linear Algebra, part 3 QR and SVD

Computational Methods. Eigenvalues and Singular Values

Methods for sparse analysis of high-dimensional data, II

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Problems. Looks for literal term matches. Problems:

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Principal Component Analysis (PCA)

Lecture Notes 1: Vector spaces

Foundations of Computer Vision

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

2. Review of Linear Algebra

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Lecture 5 Singular value decomposition

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Linear Algebra (Review) Volker Tresp 2017

Singular Value Decomposition

Latent Semantic Analysis. Hongning Wang

7. Symmetric Matrices and Quadratic Forms

Properties of Matrices and Operations on Matrices

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

Probabilistic Latent Semantic Analysis

Linear Algebra - Part II

Linear Algebra (Review) Volker Tresp 2018

Review of Linear Algebra

Solutions to Review Problems for Chapter 6 ( ), 7.1

Singular Value Decomposition

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Chapter 7: Symmetric Matrices and Quadratic Forms

15 Singular Value Decomposition

7 Principal Component Analysis

Principal Component Analysis and Linear Discriminant Analysis

Transcription:

Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas

Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx we say that λ is an eigenvalue of A with corresponding eigenvector x Could be multiple eigenvectors for the same λ

Eigenvalues of Symmetric Matrices If A R n n is symmetric, then it has n linearly independent eigenvectors v 1,, v n corresponding to n real eigenvalues A symmetric matrix is positive definite if and only if all of its eigenvalues are positive

Eigenvalues of Symmetric Matrices If A R n n is symmetric, then it has n linearly independent eigenvectors v 1,, v n corresponding to n real eigenvalues Moreover, it has n linearly independent orthonormal eigenvectors: v i T v j = 0 for all i j v i T v i = 1 for all i

Example The 2x2 identity matrix has all of its eigenvalues equal to 1 with orthonormal eigenvectors 1 0 and 0 1 The matrix 1 1 1 1 orthonormal eigenvectors has eigenvalues 0 and 2 with 1 2 1 2 and 1 2 1 2

Eigenvalues Suppose A R n n is symmetric Any x R n can be written as x = n i=1 v 1,, v n are the eigenvectors of A c i v i where Ax = n i=1 λ i c i v i A 2 x = n i=1 λ 2 i c i v i A t x = n i=1 λ t i c i v i

Eigenvalues Suppose A R n n is symmetric Any x R n can be written as x = n i=1 v 1,, v n are the eigenvectors of A c i v i where c i = v i T x, this is the projection of x along the line given by v i (assuming that v i is a unit vector)

Eigenvalues Let Q R n n be the matrix whose i th column is v i and D R n n be the diagonal matrix such that D ii = λ i Ax = QDQ T x Can throw away some eigenvectors to approximate this quantity For example, let Q k be the matrix formed by keeping only the top k eigenvectors and D k be the diagonal matrix whose diagonal consists of the top k eigenvalues

Frobenius Norm The Frobenius norm is a matrix norm written as A F = n n A ij 2 i=1 j=1 Q k D k Q k T is the best rank k approximation of the matrix symmetric matrix A with respect to the Frobenius norm

Principal Component Analysis Given a collection of data points sampled from some distribution x 1,, x p R n Construct the matrix X R n p whose i th column is x i Want to reduce the dimensionality of the data while still maintaining a good approximation of the sample mean and variance

Principal Component Analysis Construct the matrix W R n p whose i th column is x i j x j p This gives the data a zero mean The matrix WW T is the sample covariance matrix WW T is symmetric and positive semidefinite (simple proof later)

Principal Component Analysis PCA attempts to find a set of orthogonal vectors that best explain the variance of the sample covariance matrix From our previous discussion, these are exactly the eigenvectors of WW T

PCA in Practice Forming the matrix WW T can require a lot of memory (especially if n p) Need a faster way to compute this without forming the matrix explicitly Typical approach: use the singular value decomposition

Singular Value Decomposition (SVD) Every matrix B R n p admits a decomposition of the form B = UΣV T where U R n n is an orthogonal matrix, Σ R n p is non-negative diagonal matrix, and V R p p is an orthogonal matrix A matrix C R m m is orthogonal if C T = C 1. Equivalently, the rows and columns of C are orthonormal vectors

Singular Value Decomposition (SVD) Every matrix B R n p admits a decomposition of the form B = UΣV T Diagonal elements of Σ called singular values where U R n n is an orthogonal matrix, Σ R n p is non-negative diagonal matrix, and V R p p is an orthogonal matrix A matrix C R m m is orthogonal if C T = C 1. Equivalently, the rows and columns of C are orthonormal vectors

SVD and PCA Returning to PCA Let W = UΣV T be the SVD of W WW T = UΣV T VΣ T U T = UΣΣ T U T U is then the matrix of eigenvectors of WW T If we can compute the SVD of W, then we don't need to form the matrix WW T

SVD and PCA For any matrix A, AA T is symmetric and positive semidefinite Let A = UΣV T be the SVD of A AA T = UΣV T VΣ T U T = UΣΣ T U T U is then the matrix of eigenvectors of AA T The eigenvalues of AA T are all non-negative because ΣΣ T = Σ 2 which are the square of the singular values of A

An Example: Eigenfaces Let s suppose that our data is a collection of images of the faces of individuals

An Example: Eigenfaces Let s suppose that our data is a collection of images of the faces of individuals The goal is, given the "training data", to correctly label unseen images Let s suppose that each image is an s s array of pixels: x i R n, n = s 2 As before, construct the matrix W R n p whose i th column is x i j x j m

An Example: Eigenfaces Forming the matrix WW T requires a lot of memory s = 256 means WW T is 65536 65536 Need a faster way to compute this without forming the matrix explicitly Could use the singular value decomposition

An Example: Eigenfaces A different approach when p n Compute the eigenvectors of A T A (this is an p p matrix) Let v be an eigenvector of A T A with eigenvalue λ AA T Av = λav This means that Av is an eigenvector of AA T with eigenvalue λ (or 0) Save the top k eigenvectors - called eigenfaces in this example

An Example: Eigenfaces The data in the matrix is training data Given a new image, we d like to determine which, if any, member of the data set that it belongs to Step 1: Compute the projection of the recentered image to classify onto each of the k eigenvectors This gives us a vector of weights c 1,, c k

An Example: Eigenfaces The data in the matrix is training data Given a new image, we d like to determine which, if any, member of the data set that it belongs to Step 2: Determine if the input image is close to one of the faces in the data set If the distance between the input and it's approximation is too large, then the input is likely not a face

An Example: Eigenfaces The data in the matrix is training data Given a new image, we d like to determine which, if any, member of the data set that it belongs to Step 3: Find the person in the training data that is closest to the new input Replace each group of training images by its average Compute the distance to the i th average c a i where a i are the coefficients of the average face for person i