Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Similar documents
Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Principal Component Analysis

Lecture: Face Recognition and Feature Reduction

Basics of Multivariate Modelling and Data Analysis

PRINCIPAL COMPONENTS ANALYSIS

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Eigenvalues, Eigenvectors, and an Intro to PCA

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

PCA, Kernel PCA, ICA

Lecture: Face Recognition and Feature Reduction

Eigenvalues, Eigenvectors, and an Intro to PCA

15 Singular Value Decomposition

Principal Component Analysis (PCA)

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Introduction to Machine Learning

7 Principal Components and Factor Analysis

Applied Multivariate Analysis

Dimensionality Reduction

Introduction to Data Mining

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Foundations of Computer Vision

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

1 Singular Value Decomposition and Principal Component

PRINCIPAL COMPONENTS ANALYSIS (PCA)

ECE 501b Homework #6 Due: 11/26

LECTURE 16: PCA AND SVD

Data Preprocessing Tasks

Chapter 3 Transformations

Chapter 4: Factor Analysis

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Machine Learning (Spring 2012) Principal Component Analysis

7. Symmetric Matrices and Quadratic Forms

STATISTICAL LEARNING SYSTEMS

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Large Scale Data Analysis Using Deep Learning

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Signal Analysis. Principal Component Analysis

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

PRINCIPAL COMPONENTS ANALYSIS

Linear Algebra - Part II

Principal Components Theory Notes

14 Singular Value Decomposition

Singular Value Decomposition and Principal Component Analysis (PCA) I

1 Linearity and Linear Systems

Statistics-and-PCA. September 7, m = 1 n. x k. k=1. (x k m) 2. Var(x) = S 2 = 1 n 1. k=1

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Multivariate Statistical Analysis

Principal Components Analysis (PCA)

Advanced Introduction to Machine Learning CMU-10715

1 Principal Components Analysis

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

CSC 411 Lecture 12: Principal Component Analysis

Covariance and Principal Components

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Factor Analysis Continued. Psy 524 Ainsworth

Learning with Singular Vectors

EE16B Designing Information Devices and Systems II

Data Mining and Analysis: Fundamental Concepts and Algorithms

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Quantitative Understanding in Biology Principal Components Analysis

NAG Toolbox for Matlab. g03aa.1

MLCC 2015 Dimensionality Reduction and PCA

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Linear Algebra Methods for Data Mining

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Principal Component Analysis

Principal Component Analysis

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

What is Principal Component Analysis?

Maximum variance formulation

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Deep Learning Book Notes Chapter 2: Linear Algebra

Eigenvalues and diagonalization

TMA4267 Linear Statistical Models V2017 [L3]

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Linear Algebra in a Nutshell: PCA. can be seen as a dimensionality reduction technique Baroni & Evert. Baroni & Evert

UPSET AND SENSOR FAILURE DETECTION IN MULTIVARIATE PROCESSES

SVD, PCA & Preprocessing

Preprocessing & dimensionality reduction

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

COMPLEX PRINCIPAL COMPONENT SPECTRA EXTRACTION

This appendix provides a very basic introduction to linear algebra concepts.

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Gopalkrishna Veni. Project 4 (Active Shape Models)

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

PCA and admixture models

Principal Component Analysis

Exercises * on Principal Component Analysis

Computation. For QDA we need to calculate: Lets first consider the case that

PRINCIPAL COMPONENT ANALYSIS

Transcription:

Principal Component Analysis Applied Multivariate Statistics Spring 2012

Overview Intuition Four definitions Practical examples Mathematical example Case study 2

PCA: Goals Goal 1: Dimension reduction to a few dimensions (use first few PC s) Goal 2: Find one-dimensional index that separates objects best (use first PC) 3

PCA: Intuition Find low-dimensional projection with largest spread 4

PCA: Intuition 5

PCA: Intuition (0.3, 0.5) Standard basis 6

PCA: Intuition X 1 X 2 Std. Basis 0.3 0.5 PC Basis 0.7 0.1 After Dim. Reduction 0.7 - (0.7, 0.1) Dimension reduction: Only keep coordinate of first (few) PC s First Principal Component (1.PC) Rotated basis: - Vector 1: Largest variance - Vector 2: Perpendicular Second Principal Component (2.PC) 7

PCA: Intuition in 1d Taken from The Elements of Stat. Learning, T. Hastie et.al. 8

PCA: Intuition in 2d Taken from The Elements of Stat. Learning, T. Hastie et.al. 9

PCA: Four equivalent definitions Always center data first! Orthogonal directions with largest variance Linear subspace (straight line, plane, etc.) with minimal squared residuals Using Spectraldecompsition (=Eigendecomposition) Using Singular Value Decomposition (SVD) Good for intuition Good for computing 10

PCA (Version 1): Orthogonal directions PC 1 is direction of largest variance PC 2 is - perpendicular to PC 1 - again largest variance PC 3 is - perpendicular to PC 1, PC 2 - again largest variance etc. PC 3 PC 1 PC 2 11

PCA (Version 2): Best linear subspace PC 1: Straight line with smallest orthogonal distance to all points PC 1 & PC 2: Plane with with smallest orthogonal distance to all points etc. 12

PCA (Version 3): Eigendecomposition Spectral Decomposition Theorem: Every symmetric, positive semidefinite Matrix R can be rewritten as R = A D A T where D is diagonal and A is orthogonal. Eigenvectors of Covariance/Correlation matrix are PC s Columns of A are PC s Diagonal entries of D (=eigenvalues) are variances along PC s (usually sorted in decreasing order) R: Function princomp 13

PCA (Version 4): Singular Value Decomposition Singular Value Decomposition: Every R can be rewritten as R = U D V T where D is diagonal and U, V are orthogonal. Columns of V are PC s Diagonal entries of D are singular values ; related to standard deviation along PC s (usually sorted in decreasing order) UD contains samples measured in PC coordinates R: Function prcomp 14

Example: Headsize of sons Standard deviation in direction of 1.PC, Var = 12.69 2 = 167.77 Standard deviation in direction of 2.PC, Var = 5.22 2 = 28.33 Total Variance = 167.77 + 28.33 = 196.1 1.PC contains 167.77/196.1 = 0.86 of total variance 2.PC contains 28.33/196.1 = 0.14 of total variance y 1 = 0.69*x1 + 0.72*x2 y 2 = -0.72*x1 + 0.69*x2 15

Computing PC scores Substract mean of all variables Output of princomp: $scores First column corresponds to coordinate in direction of 1.PC, Second col. corresponds to coordinate in direction of 2.PC, etc. Manually (e.g. for new observations): Scalar product of loading of i th PC gives coordinate in direction of i th PC Predict new scores: Use function predict (see?predict.princomp) Example: Headsize of sons 16

Interpretation of PCs Oftentimes hard Look at loadings and try to interpret: Difference in head sizes of both sons Average head size of both sons 17

To scale or not to scale R: In princomp, option cor = TRUE scales variables Alternatively: Use correlation matrix instead of covariance matrix Use correlation, if different units are compared Using covariance will find the variable with largest spread as 1. PC Example: Blood Measurement 18

How many PC s? No clear cut rules, only rules of thumb Rule of thumb 1: Cumulative proportion should be at least 0.8 (i.e. 80% of variance is captured) Rule of thumb 2: Keep only PC s with above-average variance (if correlation matrix / scaled data was used, this implies: keep only PC s with eigenvalues at least one) Rule of thumb 3: Look at scree plot; keep only PC s before the elbow (if there is any ) 19

How many PC s: Blood Example Rule 1: 5 PC s Rule 3: Ellbow after PC 1 (?) Rule 2: 3 PC s 20

Mathematical example in detail: Computing eigenvalues and eigenvectors See blackboard 21

Case study: Heptathlon Seoul 1988 22

Biplot: Show info on samples AND variables Approximately true: Data points: Projection on first two PCs Distance in Biplot ~ True Distance Projection of sample onto arrow gives original (scaled) value of that variable Arrowlength: Variance of variabel Angle between Arrows: Correlation Approximation is often crude; good for quick overview 23

PCA: Eigendecomposition vs. SVD PCA based on Eigendecomposition: princomp + easier to understand mathematical background + more convenient summary method PCA based on SVD: prcomp + numerically more stable + still works if more dimensions than samples Both methods give same results up to small numerical differences 24

Concepts to know 4 definitions of PCA Interpretation: Output of princomp, biplot Predict scores for new observations How many PC s? Scale or not? Know advantages of PCA based on SVD 25

R functions to know princomp, biplot (prcomp just know that it exists and that it does the SVD approach) 26