LECTURE 16: PCA AND SVD

Similar documents
A Tutorial on Principal Component Analysis

A Tutorial on Principal Component Analysis

Linear Algebra - Part II

Principal Component Analysis

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction

Linear Algebra for Machine Learning. Sargur N. Srihari

Introduction to Machine Learning

Principal Component Analysis

PCA, Kernel PCA, ICA

Lecture 3: Review of Linear Algebra

Mathematical foundations - linear algebra

Lecture 3: Review of Linear Algebra

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

1 Principal Components Analysis

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis

Multivariate Statistical Analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

1 Linearity and Linear Systems

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

PCA and admixture models

Singular Value Decomposition and Principal Component Analysis (PCA) I

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

MLCC 2015 Dimensionality Reduction and PCA

14 Singular Value Decomposition

CS 143 Linear Algebra Review

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Principal Component Analysis

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

1 Inner Product and Orthogonality

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

1 Singular Value Decomposition and Principal Component

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

CSE 554 Lecture 7: Alignment

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Foundations of Computer Vision

What is Principal Component Analysis?

Large Scale Data Analysis Using Deep Learning

Linear Algebra Review. Vectors

Image Registration Lecture 2: Vectors and Matrices

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Linear Algebra Review. Fei-Fei Li

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Eigenvalues and diagonalization

Announcements (repeat) Principal Components Analysis

Machine Learning (Spring 2012) Principal Component Analysis

EE731 Lecture Notes: Matrix Computations for Signal Processing

Principal Component Analysis

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Principal Component Analysis!! Lecture 11!

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Introduction to Data Mining

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Principal Component Analysis

Deep Learning Book Notes Chapter 2: Linear Algebra

Parallel Singular Value Decomposition. Jiaxing Tan

2. Review of Linear Algebra

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Component Analysis and Linear Discriminant Analysis

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Eigenvalues, Eigenvectors, and an Intro to PCA

Dimensionality Reduction with Principal Component Analysis

Unsupervised Learning: Dimensionality Reduction

Principal Component Analysis (PCA)

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

The Singular Value Decomposition

Principal Component Analysis CS498

Mathematical foundations - linear algebra

Independent Component Analysis and Its Application on Accelerator Physics

Principal Component Analysis (PCA)

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Problem # Max points possible Actual score Total 120

Notes on Implementation of Component Analysis Techniques

Jeffrey D. Ullman Stanford University

Principal components analysis COMS 4771

Exercises * on Principal Component Analysis

Dimensionality Reduction

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Linear Algebra Review. Fei-Fei Li

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Linear Algebra in Actuarial Science: Slides to the lecture

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

15 Singular Value Decomposition

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Transcription:

Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis.

CONTENT Principal Component Analysis (PCA) Singular Value Decomposition (SVD)

PRINCIPLE COMPONENT ANALYSIS PCA finds a linear projection of high dimensional data into a lower dimensional subspace such as: The variance retained is maximized. The least square reconstruction error is minimized

PCA STEPS Linearly transform an N d matrix X into an N m matrix Ò Centralize the data (subtract the mean). Calculate the d d covariance matrix: C = 1 N 1 XT X C i,j = 1 N X N 1 q=1 q,ix q,i C i,i (diagonal) is the variance of variable i. C i,j (off-diagonal) is the covariance between variables i and j. Calculate the eigenvectors of the covariance matrix (orthonormal). Select m eigenvectors that correspond to the largest m eigenvalues to be the new basis.

EIGENVECTORS If A is a square matrix, a non-zero vector v is an eigenvector of A if there is a scalar λ (eigenvalue) such that AA = λv Example: If we think of the squared matrix A as a transformation matrix, then multiply it with the eigenvector do not change its direction.

PCA EXAMPLE X : the data matrix with N=11 objects and d=2 dimensions

Step 1: subtract the mean and calculate the covariance matrix C.

Step 2: Calculate the eigenvectors and eigenvalues of the covariance matrix: Notice that v1 and v2 are orthonormal:

Step 3: project the data Let V = [v 1, v m ] is d m matrix where the columns vi are the eigenvectors corresponding to the largest m eigenvalues The projected data: Y=X V is N m matrix. If m=d (more precisely rank(x)), then there is no loss of information!

Step 3: project the data The eigenvector with the highest eigenvalue is the principle component of the data. if we are allowed to pick only one dimension, the principle component is the best direction (retain the maximum variance). Our PC is v 1 0.677 0.735 T

USEFUL PROPERTIES The covariance matrix is always symmetric The principal components of X are orthonormal

USEFUL PROPERTIES Theorem 1: if square d d matrix S is a real and symmetric matrix (S = S T ) then S = V Λ V T Where V = [v 1, v d ] are the eigenvectors of S and Λ = dddd (λ 1, λ d ) are the eigenvalues. Proof: S V = V Λ [S v 1 S v d ] = [λ 1. v 1 λ d. v d ]: the definition of eigenvectors. S = V Λ V 1 S = V Λ V T because V is orthonormal V 1 = V T

USEFUL PROPERTIES The projected data: Y=X V The covariance matrix of Y is because the covariance matrix C X is symmetric because V is orthonormal After the transformation, the covariance matrix becomes diagonal.

DERIVATION OF PCA : 1. MAXIMIZING VARIANCE Assume the best transformation is one that maximize the variance of project data. Find the equation for variance of projected data. Introduce constraint Maximize the un-constraint equation. ( find derivative w.r.t projection axis and set to zero)

DERIVATION OF PCA : 2. MINIMIZING TRANSFORMATION ERROR Define error Identify variables that needs to be optimized in the error Minimize and solve for the variables. Interpret the information

SINGULAR VALUE DECOMPOSITION(SVD) Any N d matrix X can be uniquely expressed as: r is the rank of the matrix X (# of linearly independent columns/rows). U is a column-orthonormal N r matrix. Σ is a diagonal r r matrix where the singular values σi are sorted in descending order. V is a column-orthonormal d r matrix.

PCA AND SVD RELATION Theorem: Let X = U Σ V T be the SVD of an N d matrix X and C = 1 N 1 XT X be the d d covariance matrix. The eigenvectors of C are the same as the right singular vectors of X. Proof: But C is symmetric, hence C = V Λ V T Therefore, the eigenvectors of the covariance matrix C are the same as matrix V (right singular vectors) and the eigenvalues of C can be computed from the singular values λ i = σ i 2 N 1

The singular value decomposition and the eigendecomposition are closely related. Namely: The left-singular vectors of X are eigenvectors of XX T The right-singular vectors of X are eigenvectors of X T X. The non-zero singular values of X (found on the diagonal entries of Σ) are the square roots of the non-zero eigenvalues of both X T X and XX T.

ASSUMPTIONS OF PCA I. Linearity II. Mean and variance are sufficient statistics. Gaussian distribution assumed III. Large variances have important dynamics. IV. The principal components are orthogonal

PCA WITH EIGENVALUE DECOMPOSITION function [signals,pc,v] = pca1(data) % PCA1: Perform PCA using covariance. % data - MxN matrix of input data % (M dimensions, N trials) % signals - MxN matrix of projected data % PC - each column is a PC % V - Mx1 matrix of variances [M,N] = size(data); % subtract off the mean for each dimension mn = mean(data,2); data = data - repmat(mn,1,n); % calculate the covariance matrix covariance = 1 / (N-1) * data * data ; % find the eigenvectors and eigenvalues [PC, V] = eig(covariance); % extract diagonal of matrix as vector V = diag(v); % sort the variances in decreasing order [junk, rindices] = sort(-1*v); V = V(rindices); PC = PC(:,rindices); % project the original data set signals = PC * data; Shlens, J. (2003). A tutorial on principal component analysis.

PCA WITH SVD function [signals,pc,v] = pca2(data) % PCA2: Perform PCA using SVD. % data - MxN matrix of input data % (M dimensions, N trials) % signals - MxN matrix of projected data % PC - each column is a PC % V - Mx1 matrix of variances [M,N] = size(data); % subtract off the mean for each dimension mn = mean(data,2); data = data - repmat(mn,1,n); % construct the matrix Y Y = data / sqrt(n-1); % SVD does it all [u,s,pc] = svd(y); % calculate the variances S = diag(s); V = S.* S; % project the original data signals = PC * data; Shlens, J. (2003). A tutorial on principal component analysis.