Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Similar documents
Advanced Introduction to Machine Learning CMU-10715

Principal Component Analysis

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Deriving Principal Component Analysis (PCA)

Unsupervised Learning: K- Means & PCA

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis (PCA)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture: Face Recognition

Example: Face Detection

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Principal Component Analysis

PCA. Principle Components Analysis. Ron Parr CPS 271. Idea:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

What is Principal Component Analysis?

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Principal Component Analysis (PCA)

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

PCA, Kernel PCA, ICA

Face Detection and Recognition

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Machine Learning - MT & 14. PCA and MDS

PCA FACE RECOGNITION

CITS 4402 Computer Vision

Introduction to Machine Learning

Dimensionality reduction

Eigenimaging for Facial Recognition

Lecture: Face Recognition and Feature Reduction

Face recognition Computer Vision Spring 2018, Lecture 21

Announcements (repeat) Principal Components Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Principal Component Analysis and Linear Discriminant Analysis

Dimensionality Reduction

Lecture: Face Recognition and Feature Reduction

Expectation Maximization

COS 429: COMPUTER VISON Face Recognition

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Face detection and recognition. Detection Recognition Sally

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Principal components analysis COMS 4771

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Data Mining Techniques

Image Analysis. PCA and Eigenfaces

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

1 Principal Components Analysis

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

ECE 661: Homework 10 Fall 2014

Statistical Machine Learning

STA 414/2104: Lecture 8

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

CS 4495 Computer Vision Principle Component Analysis

Pattern Recognition 2

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Linear Subspace Models

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Eigenfaces. Face Recognition Using Principal Components Analysis

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

1 Singular Value Decomposition and Principal Component

Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis

Numerical Methods I Singular Value Decomposition

4 Bias-Variance for Ridge Regression (24 points)

Face Recognition and Biometric Systems

A Unified Bayesian Framework for Face Recognition

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Machine Learning (Spring 2012) Principal Component Analysis

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Lecture 17: Face Recogni2on

Computation. For QDA we need to calculate: Lets first consider the case that

Lecture Notes 1: Vector spaces

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

The Mathematics of Facial Recognition

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Machine Learning 2nd Edition

Principal Component Analysis CS498

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Dimensionality Reduction and Principle Components

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Lecture 13 Visual recognition

Main matrix factorizations

Principal Component Analysis

2D Image Processing Face Detection and Recognition

Lecture 17: Face Recogni2on

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Machine learning for pervasive systems Classification in high-dimensional spaces

Dimensionality Reduction

Lecture 7: Con3nuous Latent Variable Models

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Transcription:

Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University

This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA

PCA Applications Data Visualization Data Compression Noise Reduction Learning Anomaly detection 3

Data Visualization Example: Given 53 blood and urine samples (features) from 65 people. How can we visualize the measurements? 4

Data Visualization Matrix format (65x53) Instances H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC A1 8.0000 4.800 14.1000 41.0000 85.0000 9.0000 34.0000 A 7.3000 5.000 14.7000 43.0000 86.0000 9.0000 34.0000 A3 4.3000 4.4800 14.1000 41.0000 91.0000 3.0000 35.0000 A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 A5 7.3000 5.500 15.4000 46.0000 84.0000 8.0000 33.0000 A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000 A7 7.8000 4.6800 14.7000 43.0000 9.0000 31.0000 34.0000 A8 8.6000 4.800 15.8000 4.0000 88.0000 33.0000 37.0000 A9 5.1000 4.7100 14.0000 43.0000 9.0000 30.0000 3.0000 Features Difficult to see the correlations between the features... 5

Data Visualization Spectral format (65 curves, one for each person) 1000 900 800 700 600 500 400 300 00 100 0 10 0 30 40 measurement Measurement 50 60 Difficult to compare the different patients... Value 6

Data Visualization Spectral format (53 pictures, one for each feature) H-Bands 1.8 1.6 1.4 1. 1 0.8 0.6 0.4 0. 0 0 10 0 30 40 50 60 70 Person Difficult to see the correlations between the features... 7

Data Visualization C-LDH Bi-variate 550 500 450 400 350 300 50 00 150 100 50 0 50 150 50 350 450 C-Triglycerides M-EPI 4 3 1 0 600 400 C-LDH 00 How can we visualize the other variables??? Tri-variate 0 0 10000300 difficult to see in 4 or higher dimensional spaces... 400 500 C-Triglycerides 9 8

Data Visualization Is there a representation better than the coordinate axes? Is it really necessary to show all the 53 dimensions? -... what if there are strong correlations between the features? How could we find the smallest subspace of the 53-D space that keeps the most information about the original data? A solution: Principal Component Analysis 9

PCA algorithms 10

Principal Component Analysis PCA: Orthogonal projection n of the of data the onto data a lowonto a lowerdimension linear space that... maximizes variance of projected data (purple line) minimizes mean squared distance between - data point and - projections (sum of blue lines) 11

Principal Component Analysis Idea: Given data points in a d-dimensional space, project them into a lower dimensional space while preserving as much information as possible. - Find best planar approximation to 3D data - Find best 1-D approximation to 10 4 -D data In particular, choose projection that minimizes squared error in reconstructing the original data. 1

Principal Component Analysis PCA Vectors originate from the center of mass. Principal component #1: points in the direction of the largest variance. Each subsequent principal component - is orthogonal to the previous ones, and - points in the directions of the largest variance of the residual subspace 13

D Gaussian dataset 14

1 st PCA axis 15

nd PCA axis 16

PCA algorithm I (sequential) Given the centered data {x 1,, x m }, compute the principal vectors: m 1 arg max T w {( w xi) } 1 1 w 1 m st PCA vector We maximize the variance of projection of x 1 w i 1 m T T arg max {[ w ( xi w1w1 xi)] } w 1 m i 1 We maximize the variance of the projection in the residual subspace w w x-x x PCA reconstruction x k th PCA vector x =w 1 (w 1T x) w 1 17

PCA algorithm I (sequential) Given w 1,, w k-1, we calculate w k principal vector as before: Maximize the variance of projection of x w k arg max w 1 1 m m i 1 {[ w T ( x i k 1 j 1 w j w T j x i )] } k th PCA vector x PCA reconstruction w We maximize the variance of the projection in the residual subspace w (w T x) w x w 1 (w 1T x) w 1 x =w 1 (w 1T x)+w (w T x) 19 18

PCA algorithm II (sample covariance matrix) Given data {x 1,, x m }, compute covariance matrix m 1 ( x x)( x x) i m 1 i T where x 1 m m i 1 x i PCA basis vectors = the eigenvectors of Larger eigenvalue more important eigenvectors 19

PCA algorithm II (sample covariance matrix) PCA algorithm(x, k): top k eigenvalues/eigenvectors x % X = N m data matrix, % each data point x i = column vector, i=1..m 1 m m i 1 x i X subtract mean x from each column vector x i in X X X T covariance matrix of X { i, u i } i=1..n = eigenvectors/eigenvalues of 1 N Return { i, u i } i=1.. k % top k PCA components 0

PCA algorithm III (SVD (SVD of of the the data data matrix) Singular Value Decomposition of the centered data matrix X. X features samples = USV T X = U S V T sig. significant samples significant noise noise noise 3 1

PCA algorithm III Columns of U the principal vectors, { u (1),, u (k) } orthogonal and has unit norm so U T U = I Can reconstruct the data using linear combinations of { u (1),, u (k) } Matrix S Diagonal Shows importance of each eigenvector Columns of V T The coefficients for reconstructing the samples

Applications 3

Face Recognition 4

Face Recognition Want to identify specific person, based on facial image Robust to glasses, lighting, - Can t just use the given 56 x 56 pixels 5

Applying PCA: Eigenfaces Method A: Build a PCA subspace for each person and check which subspace can reconstruct the test image the best Method B: Build one PCA database for the whole dataset and then classify based on the weights. X = x 1,, x m m faces 56 x 56 real values Example data set: Images of faces Famous Eigenface approach [Turk & Pentland], [Sirovich & Kirby] Each face x is 56 56 values (luminance at location) x in 56 56 (view as 64K dim vector) Form X = [ x 1,, x m ] centered data mtx Compute = XX T Problem: is 64K 64K HUGE!!! 7 6

Computational Complexity Suppose m instances, each of size N Eigenfaces: m=500 faces, each of size N=64K Given N N covariance matrix can compute all N eigenvectors/eigenvalues in O(N 3 ) first k eigenvectors/eigenvalues in O(k N ) But if N=64K, EXPENSIVE! 7

A Clever Workaround Note that m<<64k Use L=X T X instead of =XX T If v is eigenvector of L then Xv is eigenvector of Proof: L v = v X T X v = v X (X T X v) = X( v) = Xv X = x 1,, x m m faces 56 x 56 real values (XX T )X v = (Xv) Xv) = (Xv) 8

Principle Components (Method B) 9

Principle Components (Method B) faster if train with - only people w/out glasses - same lighting conditions 30

Shortcomings Requires carefully controlled data: - All faces centered in frame - Same size - Some sensitivity to angle Method is completely knowledge free - (sometimes this is good!) - Doesn t know that faces are wrapped around 3D objects (heads) - Makes no effort to preserve class distinctions 31

Happiness subspace (method A) 3

Disgust subspace (method A) 33

Facial Expression Recognition Movies 34

Facial Expression Recognition Movies 35

Facial Expression Recognition Movies Movies 36

Image Compression 37

Original Image Divide the de the original original 37x49 37x49 image into image patches: into patches: - Each patch is an instance View each as a 144-D vector 38

L error and PCA dim 39

PCA compression: 144D => 60D 40

PCA compression: 144D => 16D 41

16 most important eigenvectors 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 4

PCA compression: 144D => 6D 43

6 most important eigenvectors 4 4 4 6 6 6 8 8 8 10 10 10 1 4 6 8 10 1 1 4 6 8 10 1 1 4 6 8 10 1 4 4 4 6 6 6 8 8 8 10 10 10 1 4 6 8 10 1 1 4 6 8 10 1 1 4 6 8 10 1 44

PCA compression: 144D => 3D 45

3 most important eigenvectors 4 4 6 6 8 8 10 10 1 4 6 8 10 1 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10 1 46

PCA compression: 144D => 1D 47

60 most important eigenvectors Looks like the discrete cosine bases of JPG! 48

D Discrete Cosine Basis http://en.wikipedia.org/wiki/discrete_cosine_transform 49

Noise Filtering 50

Noise Filtering x x U x 51

Noisy image 5

Denoised image using 15 PCA components 53

PCA Shortcomings 54

Problematic Data Set for PCA PCA doesn t know labels! 55

PCA vs. Fisher Linear Discriminant Principal Component Analysis higher variance bad for discriminability slide by Javier Hernandez Rivera Fisher Linear Discriminant smaller variance good discriminability 56

Problematic Data Set for PCA PCA cannot capture NON-LINEAR structure! 57

PCA Conclusions PCA - Finds orthonormal basis for data - Sorts dimensions in order of importance - Discard low significance dimensions Uses: - Get compact description - Ignore noise - Improve classification (hopefully) Not magic: - Doesn t know class labels - Can only capture linear variations One of many tricks to reduce dimensionality! 58