Deriving Principal Component Analysis (PCA)

Similar documents
Advanced Introduction to Machine Learning CMU-10715

Principal Component Analysis

Principal Component Analysis

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

PCA, Kernel PCA, ICA

Lecture: Face Recognition

Unsupervised Learning: K- Means & PCA

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

PCA FACE RECOGNITION

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

What is Principal Component Analysis?

Lecture: Face Recognition and Feature Reduction

Dimensionality reduction

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Face recognition Computer Vision Spring 2018, Lecture 21

Lecture: Face Recognition and Feature Reduction

Expectation Maximization

Principal Component Analysis (PCA)

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Lecture 7: Con3nuous Latent Variable Models

Dimensionality Reduction

Lecture 17: Face Recogni2on

CS 4495 Computer Vision Principle Component Analysis

Face Detection and Recognition

PCA. Principle Components Analysis. Ron Parr CPS 271. Idea:

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Eigenfaces. Face Recognition Using Principal Components Analysis

Lecture 17: Face Recogni2on

Principal Component Analysis (PCA)

Example: Face Detection

Face detection and recognition. Detection Recognition Sally

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Principal Components Analysis (PCA)

COS 429: COMPUTER VISON Face Recognition

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Linear Subspace Models

CITS 4402 Computer Vision

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Machine Learning - MT & 14. PCA and MDS

Reconnaissance d objetsd et vision artificielle

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

November 28 th, Carlos Guestrin 1. Lower dimensional projections

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Perceptron (Theory) + Linear Regression

Introduction to Machine Learning

Announcements (repeat) Principal Components Analysis

Eigenimaging for Facial Recognition

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Principal components analysis COMS 4771

1 Principal Components Analysis

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

MLE/MAP + Naïve Bayes

Numerical Methods I Singular Value Decomposition

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Lecture 13 Visual recognition

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Image Analysis. PCA and Eigenfaces

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Machine Learning 2nd Edition

The Mathematics of Facial Recognition

Data Mining Techniques

PCA and admixture models

Deep Learning (CNNs)

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi. University of Virginia

CS 559: Machine Learning Fundamentals and Applications 5 th Set of Notes

Hidden Markov Models

PCA Review. CS 510 February 25 th, 2013

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

STA 414/2104: Lecture 8

20 Unsupervised Learning and Principal Components Analysis (PCA)

CSC 411 Lecture 12: Principal Component Analysis

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Dimensionality Reduction

All you want to know about GPs: Linear Dimensionality Reduction

Hidden Markov Models

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

MLCC 2015 Dimensionality Reduction and PCA

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Dimensionality Reduc1on

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Computation. For QDA we need to calculate: Lets first consider the case that

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Dimension Reduction and Low-dimensional Embedding

Statistical Machine Learning

Gopalkrishna Veni. Project 4 (Active Shape Models)

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Transcription:

-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct. 3, 01 1

Reminders Quiz 1: Linear Algebra (today) Homework 3: Matrix Calculus + Probability Out: Wed, Oct. 3 Due: Wed, Oct. at 11:59pm Quiz : Matrix Calculus + Probability In-class, Wed, Oct. 3

Q&A

DIMENSIONALITY REDUCTION

PCA Outline Dimensionality Reduction High-dimensional data Learning (low dimensional) representations Principal Component Analysis (PCA) Examples: D and 3D Data for PCA PCA Definition Objective functions for PCA PCA, Eigenvectors, and Eigenvalues Algorithms for finding Eigenvectors / Eigenvalues PCA Examples Face Recognition Image Compression 7

High Dimension Data Examples of high dimensional data: High resolution images (millions of pixels)

High Dimension Data Examples of high dimensional data: Multilingual News Stories (vocabulary of hundreds of thousands of words) 9

High Dimension Data Examples of high dimensional data: Brain Imaging Data (0s of MBs per scan) Image from (Wehbe et al., 01) Image from https://pixabay.com/en/brain-mrt-magnetic-resonance-imaging-179/

High Dimension Data Examples of high dimensional data: Customer Purchase Data 11

Learning Representations PCA, Kernel PCA, ICA: Powerful unsupervised learning techniques for extracting hidden (potentially lower dimensional) structure from high dimensional datasets. Useful for: Visualization More efficient use of resources (e.g., time, memory, communication) Statistical: fewer dimensions à better generalization Noise removal (improving data quality) Further processing by machine learning algorithms Slide from Nina Balcan

PRINCIPAL COMPONENT ANALYSIS (PCA) 1

PCA Outline Dimensionality Reduction High-dimensional data Learning (low dimensional) representations Principal Component Analysis (PCA) Examples: D and 3D Data for PCA PCA Definition Objective functions for PCA PCA, Eigenvectors, and Eigenvalues Algorithms for finding Eigenvectors / Eigenvalues PCA Examples Face Recognition Image Compression 17

Principal Component Analysis (PCA) In case where data lies on or near a low d-dimensional linear subspace, axes of this subspace are an effective representation of the data. Identifying the axes is known as Principal Components Analysis, and can be obtained by using classic matrix computation tools (Eigen or Singular Value Decomposition). Slide from Nina Balcan

Slide from Barnabas Poczos D Gaussian dataset

Slide from Barnabas Poczos 1 st PCA axis

Slide from Barnabas Poczos nd PCA axis

Principal Component Analysis (PCA) Whiteboard Data for PCA PCA Definition Objective functions for PCA

Data for PCA ( (1) ) T D = { (i) } N ( () ) T i=1 =. ( (N) ) T We assume the data is centered, and that each axis has sample variance equal to one. µ = 1 N N i=1 (i) = 0 j = 1 N N i=1 (x (i) j ) =1 3

Sample Covariance Matrix The sample covariance matrix is given by: jk = 1 N N i=1 (x (i) j µ j )(x (i) k µ k ) Since the data matrix is centered, we rewrite as: = 1 N T

Maximizing the Variance Quiz: Consider the two projections below 1. Which maximizes the variance?. Which minimizes the reconstruction error? Option A Option B 5

PCA Equivalence of Maximizing Variance and Minimizing Reconstruction Error

Principal Component Analysis (PCA) Whiteboard PCA, Eigenvectors, and Eigenvalues Algorithms for finding Eigenvectors / Eigenvalues SVD: Relation of Singular Vectors to Eigenvectors 7

SVD for PCA

SVD for PCA 9

Principal Component Analysis (PCA) X X # v = λv, so v (the first PC) is the eigenvector of sample correlation/covariance matrix ' ' ( Sample variance of projection v ( ' ' ( v = )v ( v = ) Thus, the eigenvalue ) denotes the amount of variability captured along that dimension (aka amount of energy along that dimension). Eigenvalues ) * ), ) - The 1 st PC / * is the the eigenvector of the sample covariance matrix ' ' ( associated with the largest eigenvalue The nd PC /, is the the eigenvector of the sample covariance matrix ' ' ( associated with the second largest eigenvalue And so on Slide from Nina Balcan

How Many PCs? For M original dimensions, sample covariance matrix is MxM, and has up to M eigenvectors. So M PCs. Where does dimensionality reduction come from? Can ignore the components of lesser significance. 5 0 Variance (%) 15 5 0 PC1 PC PC3 PC PC5 PC PC7 PC PC9 PC You do lose some information, but if the eigenvalues are small, you don t lose much M dimensions in original data calculate M eigenvectors and eigenvalues choose only the first D eigenvectors, based on their eigenvalues final data set has only D dimensions Eric Xing @ CMU, 00-011 3

Slides from Barnabas Poczos Original sources include: Karl Booksh Research group Tom Mitchell Ron Parr PCA EXAMPLES 33

Slide from Barnabas Poczos Face recognition

Challenge: Facial Recognition Want to identify specific person, based on facial image Robust to glasses, lighting, Þ Can t just use the given 5 x 5 pixels Slide from Barnabas Poczos

Applying PCA: Eigenfaces Method: Build one PCA database for the whole dataset and then classify based on the weights. X = x 1,, x m 5 x 5 real values Example data set: Images of faces Famous Eigenface approach [Turk & Pentland], [Sirovich & Kirby] Each face x is 5 5 values (luminance at location) x in  5 5 (view as K dim vector) m faces Slide from Barnabas Poczos

Slide from Barnabas Poczos Principle Components

Reconstructing faster if train with only people w/out glasses same lighting conditions Slide from Barnabas Poczos

Shortcomings Requires carefully controlled data: All faces centered in frame Same size Some sensitivity to angle Alternative: Learn one set of PCA vectors for each angle Use the one with lowest error Method is completely knowledge free (sometimes this is good!) Doesn t know that faces are wrapped around 3D objects (heads) Makes no effort to preserve class distinctions Slide from Barnabas Poczos

Slide from Barnabas Poczos Image Compression

Original Image Divide the original 37x9 image into patches: Each patch is an instance that contains 1x1 pixels on a grid View each as a 1-D vector Slide from Barnabas Poczos

Slide from Barnabas Poczos L error and PCA dim

PCA compression: 1D à 0D Slide from Barnabas Poczos

PCA compression: 1D à 1D Slide from Barnabas Poczos

1 most important eigenvectors 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Slide from Barnabas Poczos

Slide from Barnabas Poczos PCA compression: 1D à D

most important eigenvectors 1 1 1 1 1 1 1 1 1 1 1 1 Slide from Barnabas Poczos

Slide from Barnabas Poczos PCA compression: 1D à 3D

3 most important eigenvectors 1 1 1 1 1 1 Slide from Barnabas Poczos

Slide from Barnabas Poczos PCA compression: 1D à 1D

0 most important eigenvectors Looks like the discrete cosine bases of JPG!... Slide from Barnabas Poczos

D Discrete Cosine Basis http://en.wikipedia.org/wiki/discrete_cosine_transform Slide from Barnabas Poczos