PCA. Principle Components Analysis. Ron Parr CPS 271. Idea:

Similar documents
Advanced Introduction to Machine Learning CMU-10715

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Principal Component Analysis

Deriving Principal Component Analysis (PCA)

What is Principal Component Analysis?

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

PCA, Kernel PCA, ICA

Introduction to Machine Learning

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Example: Face Detection

14 Singular Value Decomposition

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Principal Component Analysis (PCA)

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Linear Algebra & Geometry why is linear algebra useful in computer vision?

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

7 Principal Component Analysis

Data Mining Techniques

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

1 Principal Components Analysis

LECTURE 16: PCA AND SVD

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Lecture: Face Recognition and Feature Reduction

Dimensionality Reduction

Preprocessing & dimensionality reduction

PCA and admixture models

CS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA

Principal Component Analysis

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Machine Learning - MT & 14. PCA and MDS

EECS 275 Matrix Computation

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Expectation Maximization

Lecture: Face Recognition and Feature Reduction

Linear Subspace Models

Computation. For QDA we need to calculate: Lets first consider the case that

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Lecture: Face Recognition

Statistical Machine Learning

Face Detection and Recognition

Principal Component Analysis (PCA)

Dimensionality reduction

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

The Singular Value Decomposition

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

15 Singular Value Decomposition

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Numerical Methods I Singular Value Decomposition

The Mathematics of Facial Recognition

CS281 Section 4: Factor Analysis and PCA

Least-Squares Regression on Sparse Spaces

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

IV. Matrix Approximation using Least-Squares

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Principal Component Analysis

Tutorial on Principal Component Analysis

Singular Value Decomposition

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Machine Learning (Spring 2012) Principal Component Analysis

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Lecture Notes 2: Matrices

CSC 411 Lecture 12: Principal Component Analysis

Image Analysis. PCA and Eigenfaces

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

CS 340 Lec. 6: Linear Dimensionality Reduction

1 Inner Product and Orthogonality

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Methods for sparse analysis of high-dimensional data, II

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Pattern Recognition 2

1 Singular Value Decomposition and Principal Component

Singular Value Decomposition and Principal Component Analysis (PCA) I

7. Variable extraction and dimensionality reduction

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Methods for sparse analysis of high-dimensional data, II

Unsupervised Learning: K- Means & PCA

Principal components analysis COMS 4771

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

L26: Advanced dimensionality reduction

STA141C: Big Data & High Performance Statistical Computing

Principal Component Analysis and Linear Discriminant Analysis

Dimensionality Reduction

Statistical Pattern Recognition

Weighted Low Rank Approximations

7.1 Support Vector Machine

Transcription:

PCA Ron Parr CPS 71 With thanks to Tom Mitchell Principle Components Analysis Iea: Given ata points in - imensional space, project into lower imensional space while preserving as much informakon as possible E.g., fin best planar approximakon to 3D ata E.g., fin best planar approximakon to 10 4 D ata In parkcular, choose projeckon that minimizes square error in reconstruckng original ata 1

Why o we care? Lower imensional representakons permit Compression Noise filtering As preprocessing for classificakon Reuces feature space imension Simpler Classifiers Possibly beyer generalizakon May facilitate simple (nearest neighbor) methos Review of a Few Linear Algebra Facts A set of vectors is orthonormal if: All vectors in the set have norm 1 Any two ifferent vectors have ot- prouct 0 Any vector in a linear space can be expresse as a weighte combinakon of norm 1 vectors specifically, the vectors than form a basis for the space

PCA: Fin ProjecKons to Minimize ReconstrucKon Error Assume ata is set of N- imensional vectors, x = ((x 1 ) T,(x ) T,...(x ) T ) Can always express k th vector as x k = rank(x n ) i=1 z i k u i u i T u j = δ ij Compact way of inicakng orthonormality PCA: given M<. Fin that minimizes where x ˆ k = x + E M M i=1 k=1 z i k u i (u 1...u M ) x k ˆ x k u u 1 Mean x = 1 i=1 x i Review: Eigenvectors Matrix A has eigenvector u with eigenvalue λ if: Au = λu For symmetric A (normalize) eigenvectors: Are orthogonal Have real eigenvalues Form an orthonormal basis for A (See appenix C of text) 3

Review: ProjecKon Orthonormal basis - > trivial projeckon Suppose U is our basis (forme by first k eigenvectors) Suppose we want to project a new x w = (U T U) 1 U T x = U T x Note: We typically assume x has mean subtracte alreay PCA: given M<. Fin (u 1...u M ) PCA u u 1 that minimizes where x ˆ k = x + E M i=1 Note we get zero error if M=. Therefore, E M = (z j i ) = M Covariance matrix: k=1 z i k u i i=m +1 j=1 x k ˆ x k u T i (x j x ) j=1[ ] i=m +1 [ ][ ] = u T i (x j x ) u T i (x j x ) i=m +1 j=1 [ ][(x j x ) T u i ] = u T i Σu i = u i T (x j x ) i=m +1 j=1 Σ = k=1 (x k x )(x k x ) T i=m +1 Equivalent problem: Maximize variance in the imensions we keep How much we lef out by Dropping vectors u M+1 u This minimize when u i is eigenvector of Σ, i.e., when: Σu i = λ i u i 4

JusKfying Use of Eigenvectors We want to minimize: u T u Subject to: u T u = 1 Use Lagrange MulKpliers to minimize: u T u λu T u Take the graient, set to 0: u λu = 0 True when we use eigenvalues, vectors Minimize E M = i=m +1 u i T Σ u i PCA x u u 1 Σu i = λ i u i Eigenvector of Σ Eigenvalue x 1 E M = λ i i=m +1 PCA algorithm: 1. X Create N x ata matrix. A subtract mean x from each column in X 3. Σ covariance matrix of A 4. Fin eigenvectors an eigenvalues of Σ 5. PC s the M eigenvectors with largest eigenvalues 5

PCA Example mean First eigenvector Secon eigenvector PCA Example Reconstructe ata using only first eigenvector (M=1) mean First eigenvector Secon eigenvector 6

Applying PCA Example ata set: Images of faces (Famous Eigenface approach [Turk & Pentlan], [Sirovich & Kirby) Each atum is a point in image space Each point vector of luminance values Vectors are long, e.g., 56x56=64K These form columns of A, Σ=AA T Problem: AA T is unreasonably large! A Clever Workaroun Note that <<N(=64K) Use L=A T A instea of Σ=AA T Suppose γ is eigenvector of L Av is eigenvector of Σ Lv = γv A T Av = γv AA T Av = γav (Av) = γ(av) 7

ApplicaKon to Eigenfaces =hunres- thousans of faces Keep M~/10 eigenvectors (eigenfaces) Achieve: Low reconstruckon error RelaKvely high classificakon accuracy (across faces) Robust measure of faceness ApplicaKon to CollaboraKve Filtering CollaboraKve filtering: Use preferences/rakngs from a set of users to preict preferences/rakngs for a new user Examples: Amazon, Newlix, etc. CollaboraKve filtering as PCA: Customers span columns Proucts span rows Principle components are customer types 8

RelaKonship to SVD SVD factors a matrix: [U,S,V] = SVD(X) X = USV T U contains eigenvectors of XX T V contains eigenvectors of X T X S(ingular values) contains sqrt(eigenvalues) Can use SVD to get PCA solukon by subtrackng mean, then running SVD Summary of PCA Uses Data compression (compress ata by represenkng enkre ata set as coefficients for a small number of principle components) Noise filtering (assume low eigenvalue components correspon to noise) Feature seleckon for supervise learning (assumes low eigenvalue components are noise/irrelevant features) Nearest neighbor classificakon (assumes subspace of principle components is a more natural space in which to measure istances) Direct classificakon (assume istance to span of principle components is an inicator of class membership) VisualizaKon (assume the first or 3 principle components show the intereskng relakonships that exist in the ata) 9

Shortcomings Requires carefully controlle ata: All ata are aligne (e.g. all faces centere in frame) No missing entries (hanle awkwarly in CF) Completely knowlege free metho (somekmes this is goo) Is purely linear, e.g., oesn t know that faces are wrappe aroun 3D objects (heas) Makes no effort to preserve class isknckons PCA Problem Data Set PCA oesn t know about labels! 10

PCA Conclusions PCA fins orthonormal basis for ata Sorts imensions in orer of importance Discar low significance imensions to: Get compact escripkon Ignore noise Improve classificakon (hopefully) Not magic: Doesn t know class labels Can only capture linear variakons One of many types of imensionality reuckon! 11