Principal Component Analysis CS498

Similar documents
MULTI-VARIATE/MODALITY IMAGE ANALYSIS

Independent Component Analysis and Its Application on Accelerator Physics

Lecture: Face Recognition and Feature Reduction

Announcements (repeat) Principal Components Analysis

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Example: Face Detection

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture: Face Recognition and Feature Reduction

Independent Component Analysis. Slides from Paris Smaragdis UIUC

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Signal Analysis. Principal Component Analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CSC 411 Lecture 12: Principal Component Analysis

LECTURE 16: PCA AND SVD

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Concentration Ellipsoids

Principal Component Analysis and Linear Discriminant Analysis

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Eigenvalues, Eigenvectors, and an Intro to PCA

CS 4495 Computer Vision Principle Component Analysis

Principal Component Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Advanced Introduction to Machine Learning CMU-10715

Principal Component Analysis

What is Principal Component Analysis?

Multivariate Statistical Analysis

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Lecture 3: Review of Linear Algebra

Classification: The rest of the story

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Dimensionality Reduction

Principal Component Analysis (PCA)

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Learning with Singular Vectors

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

3D Computer Vision - WT 2004

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

PCA, Kernel PCA, ICA

20 Unsupervised Learning and Principal Components Analysis (PCA)

14 Singular Value Decomposition

Polynomial Chaos and Karhunen-Loeve Expansion

Covariance and Correlation Matrix

Linear Subspace Models

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Contents. Acknowledgments

Waveform-Based Coding: Outline

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

MACHINE LEARNING ADVANCED MACHINE LEARNING

7 Principal Component Analysis

Principal Component Analysis

Singular Value Decomposition and Principal Component Analysis (PCA) I

Deriving Principal Component Analysis (PCA)

15 Singular Value Decomposition

Principal Component Analysis

Lecture 3: Review of Linear Algebra

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Principal components analysis COMS 4771

Principal Component Analysis (PCA)

Dimensionality Reduction

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Lecture VIII Dim. Reduction (I)

12.2 Dimensionality Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

CS 143 Linear Algebra Review

Lecture Notes 2: Matrices

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

1 Singular Value Decomposition and Principal Component

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

Pattern Recognition 2

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

MATH590: Approximation in R d

1 Linearity and Linear Systems

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

EECS 275 Matrix Computation

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Statistical Machine Learning

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

CS281 Section 4: Factor Analysis and PCA

Dimensionality Reduction and Principal Components

CIFAR Lectures: Non-Gaussian statistics and natural images

Tutorial on Principal Component Analysis

FSAN/ELEG815: Statistical Learning

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

STA 414/2104: Machine Learning

Independent component analysis for functional data

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Transcription:

Principal Component Analysis CS498

Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which

A dual goal Find a good representation The features part Reduce redundancy in the data A side effect of proper features

Describe this input Example case

What about now?

A good feature Simplify the explanation of the input Represent repeating patterns When defined makes the input simpler How do we define these abstract qualities? On to the math

Linear features Z = WX features samples = features dimensions dimensions samples Weight Matrix Feature Matrix Input Matrix

A 2D case Matrix representation of data x 1 Z = WX = = zt 1 z T 2 = wt 1 w T 2 x T 1 x T 2 x 2 Scatter plot of same data x 2 x 1

Defining a goal Desirable feature features Give simple weights Avoid feature similarity How do we define these?

One way to proceed Simple weights Minimize relation of the two dimensions Feature similarity Same thing!

One way to proceed Simple weights Minimize relation of the two dimensions Decorrelate: z T 1 z 2 = 0 Feature similarity Same thing! Decorrelate: w T 1 w 2 = 0

Diagonalizing the covariance Covariance matrix ( ) = ztz z T z 1 1 1 2 Cov z 1,z 2 z T 2 z 1 z T 2 z 2 Diagonalizing the covariance suppresses cross-dimensional co-activity if z 1 is high, z 2 won t be, etc ( ) = ztz z T z 1 1 1 2 Cov z 1,z 2 z T 2 z 1 z T 2 z 2 / N / N = 1 0 0 1 = I

Problem definition For a given input X Find a feature matrix W So that the weights decorrelate ( WX) ( WX) T = N I ZZ T = N I

Any ideas? How do we solve this? ( WX) ( WX) T = N I

Solving for diagonalization ( WX) ( WX) T = N I WXX T W T = N I WCov( X)W T = I

Solving for diagonalization Covariance matrices are positive definite Therefore symmetric have orthogonal eigenvectors and real eigenvalues and are factorizable by: U T AU = Λ Where U has eigenvectors of A in its columns Λ=diag(λ i ), where λ i are the eigenvalues of A

Solving for diagonalization The solution is a function of the eigenvectors U and eigenvalues Λ of Cov(X) WCov( X)W T = I W = λ 1 0 0 λ 2 1 U T

So what does it do? Input data covariance: Cov( X) 14.9 0.05 0.05 1.08 Extracted feature matrix: W 0.12 30.4 8.17 0.03 / N Weights covariance: Cov(WX) = 1 0 0 1

Another solution This is not the only solution to the problem Consider this one: ( WX) ( WX) T = N I WXX T W T = N I W = ( XX T ) 1/2

Another solution This is not the only solution to the problem Consider this one: ( WX) ( WX) T = N I Similar but out of scope for now WXX T W T = N I W = ( XX T ) 1/2 W = US 1/2 V T [U,S,V] = SVD( XX T )

Decorrelation in pictures An implicit Gaussian assumption N-D data has N directions of variance Input Data

Undoing the variance The decorrelating matrix W contains two vectors that normalize the input s variance Input Data

Resulting transform Input gets scaled to a well behaved Gaussian with unit variance in all dimensions Input Data Transformed Data (feature weights)

A more complex case Having correlation between two dimensions We still find the directions of maximal variance But we also rotate in addition to scaling Input Data Transformed Data (feature weights)

One more detail So far we considered zero-mean inputs The transforming operation was a rotation If the input mean is not zero bad things happen! Make sure that your data is zero-mean! Input Data Transformed Data (feature weights)

Principal Component Analysis This transform is known as PCA The features are the principal components They are orthogonal to each other And produce orthogonal (white) weights Major tool in statistics Removes dependencies from multivariate data Also known as the KLT Karhunen-Loeve transform

A better way to compute PCA The Singular Value Decomposition way [U,S,V] = SVD(A) A = USV T Relationship to eigendecomposition In our case (covariance input A), U and S will hold the eigenvectors/values of A Why the SVD? More stable, more robust, fancy extensions

PCA through the SVD features dimensions features Feature Matrix samples eigenvalue matrix samples samples = SVD dimensions samples Input Matrix Weight Matrix features features dimensions features Feature Matrix Eigenvalue matrix features features Weight Matrix = SVD dimensions dimensions Input Covariance

Dimensionality reduction PCA is great for high dimensional data Allows us to perform dimensionality reduction Helps us find relevant structure in data Helps us throw away things that won t matter

A simple example Two very correlated dimensions e.g. size and weight of fruit One effective variable PCA matrix here is: W = 0.2 0.13 13.7 28.2 Large variance between the two components about two orders of magnitude

A simple example Second principal component needs to be super-boosted to whiten the weights maybe is it useless? Keep only high variance Throw away components with minor contributions

What is the number of dimensions? If the input was M dimensional, how many dimensions do we keep? No solid answer (estimators exist but are flaky) Look at the singular/eigen-values They will show the variance of each component, at some point it will be small

Example Eigenvalues of 1200 dimensional video data Little variance after component 30 We don t need to keep the rest of the data 2.5 x 105 2 1.5 1 0.5 0 0 10 20 30 40 50

So where are the features? We strayed off-subject What happened to the features? We only mentioned that they are orthogonal We talked about the weights so far, let s talk about the principal components They should encapsulate structure How do they look like?

Face analysis Analysis of a face database What are good features for faces? Is there anything special there? What will PCA give us? Any extra insight? Lets use MATLAB to find out

The Eigenfaces

Low-rank model Instead of using 780 pixel values we use the PCA weights (here 50 and 5) Input Full Approximation Mean Face 1 2 3 4 5 Dominant eigenfaces 985.953 732.591 655.408 229.737 227.179 1 2 3 4 5 Cumulative approx

PCA for large dimensionalities Sometimes the data is high dim e.g. videos 1280x720xT = 921,600D x T frames You will not do an SVD that big! Complexity is O(4m 2 n + 8mn 2 + 9n 3 ) Useful approach is the EM-PCA

EM-PCA in a nutshell Alternate between successive approximations Start with random C and loop over: Z = C + X C = XZ + After convergence C spans the PCA space If we choose a low rank C then computations are significantly more efficient than the SVD More later when we cover EM

PCA for online data Sometimes we have too many data samples Irrespective of the dimensionality e.g. long video recordings Incremental SVD algorithms Update the U,S,V matrices with only a small set or a single sample point Very efficient updates

A Video Example The movie is a series of frames Each frame is a data point 126, 80x60 pixel frames Data will be 4800x126 We can do PCA on that

PCA Results

PCA for online data II Neural net algorithms Naturally online approaches With each new datum, PC s are updated Oja s and Sanger s rules Gradient algorithms that update W Great when you have minimal resources

PCA and the Fourier transform We ve seen why sinusoids are important But can we statistically justify it? PCA has a deep connection with the DFT In fact you can derive the DFT from PCA

An example Let s take a time series which is not white Each sample is somewhat correlated with the previous one (Markov process) x(t),, x(t +T) We ll make it multidimensional X = x(t) x(t + 1) x(t + N ) x(t + 1 + N )

An example In this context, features will be repeating temporal patterns smaller than N Z = W x(t) x(t + 1) x(t + N ) x(t + 1 + N ) If W is the Fourier matrix then we are performing a frequency analysis

PCA on the time series By definition there is a correlation between successive samples Cov( X) 1 1 e 0 1 e 1 1 e 1 e 1 1 e 0 1 e 1 Resulting covariance matrix will be symmetric Toeplitz with a diagonal tapering towards 0 =

Solving for PCA The eigenvectors of Toeplitz matrices like this one are (approximately) sinusoids 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

And ditto with images Analysis of coherent images results in 2D sinusoids

So now you know The Fourier transform is an optimal decomposition for time series In fact you will often not do PCA and do a DFT There is also a loose connection with our perceptual system We kind of use similar filters in our ears and eyes (but we ll make that connection later)

Recap Principal Component Analysis Get used to it! Decorrelates multivariate data, finds useful components, reduces dimensionality Many ways to get to it Knowing what to use with your data helps Interesting connection to Fourier transform

Check these out for more Eigenfaces http://en.wikipedia.org/wiki/eigenface http://www.cs.ucsb.edu/~mturk/papers/mturk-cvpr91.pdf http://www.cs.ucsb.edu/~mturk/papers/jcn.pdf Incremental SVD http://www.merl.com/publications/tr2002-024/ EM-PCA http://cs.nyu.edu/~roweis/papers/empca.ps.gz