Principal Component Analysis (PCA) CSC411/2515 Tutorial

Similar documents
Principal Component Analysis

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Principal Component Analysis

Principal Component Analysis (PCA)

Computation. For QDA we need to calculate: Lets first consider the case that

Advanced Introduction to Machine Learning CMU-10715

CSC 411 Lecture 12: Principal Component Analysis

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Dimensionality Reduction

Dimensionality Reduction

Introduction to Machine Learning

1 Principal Components Analysis

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Maximum variance formulation

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

EECS 275 Matrix Computation

STA 414/2104: Lecture 8

Lecture 7: Con3nuous Latent Variable Models

What is Principal Component Analysis?

Principal Component Analysis

Linear Algebra & Geometry why is linear algebra useful in computer vision?

PCA, Kernel PCA, ICA

LECTURE 16: PCA AND SVD

Covariance and Correlation Matrix

Principal Component Analysis (PCA)

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Machine Learning (Spring 2012) Principal Component Analysis

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Principal Component Analysis

CSC321 Lecture 20: Autoencoders

Lecture: Face Recognition and Feature Reduction

Expectation Maximization

Principal Component Analysis

7. Variable extraction and dimensionality reduction

Dimensionality Reduction with Principal Component Analysis

Lecture: Face Recognition and Feature Reduction

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Announcements (repeat) Principal Components Analysis

Machine Learning, Fall 2011: Homework 5

STA 414/2104: Lecture 8

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Tutorial on Principal Component Analysis

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Principal Components Analysis (PCA)

Principal Components Analysis. Sargur Srihari University at Buffalo

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

PCA and LDA. Man-Wai MAK

Unsupervised Learning: K- Means & PCA

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Exercises * on Principal Component Analysis

PCA and admixture models

Nonlinear Dimensionality Reduction

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Data Mining Techniques

Dimensionality reduction

Linear Dimensionality Reduction

CSE 554 Lecture 7: Alignment

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Subspace Models

Clustering, K-Means, EM Tutorial

Principal Component Analysis and Linear Discriminant Analysis

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

PRINCIPAL COMPONENTS ANALYSIS

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Face Detection and Recognition

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

PRINCIPAL COMPONENT ANALYSIS

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Unsupervised Learning: Linear Dimension Reduction

Deriving Principal Component Analysis (PCA)

LEC 3: Fisher Discriminant Analysis (FDA)

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9

Principal Component Analysis

Covariance and Principal Components

PCA and LDA. Man-Wai MAK

Learning with Singular Vectors

Statistical Machine Learning

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

CS281 Section 4: Factor Analysis and PCA

Eigenfaces. Face Recognition Using Principal Components Analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Image Analysis. PCA and Eigenfaces

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

STA141C: Big Data & High Performance Statistical Computing

Machine Learning - MT & 14. PCA and MDS

Notes on Latent Semantic Analysis

1 Singular Value Decomposition and Principal Component

(a)

LECTURE NOTE #10 PROF. ALAN YUILLE

Transcription:

Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT) PCA October 19th, 2017 1 / 24

Overview 1 Motivation Dimensionality Reduction Two Perspectives on Good Transformations 2 PCA Maximum Variance Minimum Reconstruction Error 3 Applications of PCA Demo 4 Summary (UofT) PCA October 19th, 2017 2 / 24

Dimensionality Reduction We have some data X R N D, where D can be very large. We want a new representation of the data Z R N K where K << D. For computational reasons To better understand / visualize the data For compression etc. We will restrict ourselves to textbflinear transformation. (UofT) PCA October 19th, 2017 3 / 24

Example In this dataset, there are only 3 degrees of freedom: (1) horizontal translations; (2) vertical translations; (3) Rotations. But each image is 100 100 = 10000 pixels, so X will be 10000 elements wide! (UofT) PCA October 19th, 2017 4 / 24

What is a Good Transformation? The goal is to find good directions u that preserves important aspects of the data In linear setting: z = x T u This will turn out to be the top-k eigenvalues of the data covariance. 2 ways to view this: 1 Find directions of maximum variation 2 Find projections that minimizes the reconstruction error (UofT) PCA October 19th, 2017 5 / 24

Two Derivations of PCA Consider the n-th datapoint x n that has 2 dimensions, x 1 and x 2 : x 2 x n x 1 (UofT) PCA October 19th, 2017 6 / 24

Two Derivations of PCA We can pick a direction u 1 to project x n onto, creating a projected point x n : x 2 u 1 x n x n x 1 (UofT) PCA October 19th, 2017 7 / 24

Two Derivations of PCA By Pythagorean theorem: R 2 }{{} Original Dist = D 2 }{{} Variance + ɛ 2 }{{} Reconstr. Err x 2 x n ɛ u 1 Since R 2 is fixed: x n Problem Equivalence Maximize D 2 (variance) Minimize ɛ 2 (reconstruction error) R D x 1 (UofT) PCA October 19th, 2017 8 / 24

Two Derivations of PCA Figure 12.2 from Bishop s Textbook: (UofT) PCA October 19th, 2017 9 / 24

Principal Component Analysis: Maximum Variance Our goal is to maximize the variance of the projected data: maximize 1 2N N (u T 1 x n u T 1 x n) = u T 1 Su 1 (1) n=1 Where the sample mean and covariance is given by: x = 1 N S = 1 N N x n (2) n=1 N (x n x)(x n x) T (3) n=1 (4) (UofT) PCA October 19th, 2017 10 / 24

Lagrange Multiplier If we want to find a stationary point of a function of multiple variables f (x) subject to one or more constraints g(x) = 0: 1 Introduce Lagrangian function: 2 Find its stationary point w.r.t. both x and λ L(x, λ) f (x) + λg(x) (5) If you are not familiar with it, check out Appendix E in Bishop s book (UofT) PCA October 19th, 2017 11 / 24

Finding u 1 We want to maximize u T 1 Su 1 subject to u 1 = 1 (since we are finding direction) Use Lagrange multiplier α 1 to express this as: Take derivative and set to 0: u T 1 Su 1 + α 1 (1 u T 1 u 1 ) (6) Su 1 α 1 u 1 = 0 (7) Su 1 = α 1 u 1 (8) So u 1 is an eigenvector of S with eigenvalue α 1 In fact, it must be the eigenvector with the maximum eigenvalue, since this maximizes the objective (UofT) PCA October 19th, 2017 12 / 24

Finding u 2 We want to maximize u T 2 Su 2 subject to u 2 = 1 and u T 2 u 1 = 0 (orthogonal to u 1 ) Use Lagrange form: u T s Su s + α s (1 u T s u 2 ) βu T 2 u 1 (9) Take derivative and set to 0 to find β: = Su 2 α 2 u 2 βu 1 = 0 u 2 (10) = u T 1 Su 2 α 2 u T 1 u 2 βu T 1 u 1 = 0 (11) = α 1 u T 1 u 2 α 2 u T 1 u 2 βu T 1 u 1 = 0 (12) = α 1 0 α 2 0 β 1 = 0 (13) = β = 0 (14) (UofT) PCA October 19th, 2017 13 / 24

Finding u 2 We want to maximize u T 2 Su 2 subject to u 2 = 1 and u T 2 u 1 = 0 (orthogonal to u 1 ) Use Lagrange form: u T s Su s + α s (1 u T s u 2 ) βu T 2 u 1 }{{} 0 (15) Take derivative and set to 0 to find α 2 : = Su 2 α 2 u 2 = 0 u 2 (16) = Su 2 = α 2 u 2 (17) So α 2 must be the second largest eigenvalue of S. (UofT) PCA October 19th, 2017 14 / 24

PCA In General We can compute the entire PCA solution by just computing the eigenvectors with the top-k eigenvalues. These can be found using the singular value decomposition (SVD) of S. (UofT) PCA October 19th, 2017 15 / 24

Choosing the number of K How do we choose the number of components? Idea: Look at the spectrum of covariance, pick K to capture most of the variation More principled: Bayesian treatment (beyond this course) (UofT) PCA October 19th, 2017 16 / 24

Principal Component Analysis: Minimum Reconstruction Error We can also think of PCA as minimizing the reconstruction error of compressed data: minimize 1 2N N x n x n 2 (18) n=1 We will omit some details for now, but the key is that we define some K-dimensional basis such that: x = W x + const (19) The solution will turn out to be the same as the maximum variance formulation (UofT) PCA October 19th, 2017 17 / 24

PCA Demo We ll apply PCA using scikit-learn in Python on various datasets for visualization / compression: Synthetic 2D data: Show the principal components learned and what the transformed data looks like MNIST digits: Compression and Reconstruction Olivetti faces dataset: Compression and Reconstruction Iris dataset: Visualization (UofT) PCA October 19th, 2017 18 / 24

PCA Application: Compression & Reconstruction For example: Olivetti Faces dataset. Apply PCA on the face images to find the principle components, and project the data down to k-dimensions (UofT) PCA October 19th, 2017 19 / 24

PCA Application: Compression & Reconstruction Reconstruction when using various values of k: (UofT) PCA October 19th, 2017 20 / 24

PCA Application: Visualization PCA can be used to find the best viewing angle to project onto a 2-D plane (or 3D) to better understand the data Example on the Iris dataset: (UofT) PCA October 19th, 2017 21 / 24

Summary PCA is a linear projection of D-dimensional {x n } to K D vector space given by {u k } basis vectors such that it: Maximizes variance in the projected data points Minimizes projection error (square loss) {u k } are orthonormal {u k } turns out to be the first K eigenvectors of the data covariance matrix with K largest eigenvalues Can be computed in O(KD 2 ) (UofT) PCA October 19th, 2017 22 / 24

Summary PCA is good for: Dimensionality reduction Visualization Compression (with loss) Denoising (by removing small variances in the data) Can be used for data whitening = decorrelation, so that features have unit covariance Caution! In classification task, if the class labels signal in the data has small variance, PCA may remove it completely (UofT) PCA October 19th, 2017 23 / 24

Thanks! (UofT) PCA October 19th, 2017 24 / 24