Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Similar documents
PRINCIPAL COMPONENTS ANALYSIS

Principal Components Analysis (PCA)

PRINCIPAL COMPONENTS ANALYSIS

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Eigenvalues, Eigenvectors, and an Intro to PCA

PRINCIPAL COMPONENTS ANALYSIS (PCA)

Robust scale estimation with extensions

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Data Preprocessing Tasks

MULTIVARIATE HOMEWORK #5

Principal Component Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimensionality Reduction

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Applied Multivariate Analysis

1 Principal Components Analysis

PCA, Kernel PCA, ICA

Exploratory Factor Analysis and Principal Component Analysis

Introduction to Machine Learning

PRINCIPAL COMPONENT ANALYSIS

Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis

Independent component analysis for functional data

Introduction to Factor Analysis

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Covariance and Correlation Matrix

Unconstrained Ordination

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Principal component analysis

Chapter 4: Factor Analysis

Machine Learning - MT & 14. PCA and MDS

Principal component analysis (PCA) for clustering gene expression data

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Covariance and Principal Components

PCA and admixture models

Mathematical foundations - linear algebra

Principal Component Analysis (PCA) Theory, Practice, and Examples

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Dimension Reduction and Classification Using PCA and Factor. Overview

What is Principal Component Analysis?

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

PRINCIPAL COMPONENT ANALYSIS

ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS

Quantitative Understanding in Biology Principal Components Analysis

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Introduction to Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

7 Principal Components and Factor Analysis

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis

Canonical Correlations

e 2 e 1 (a) (b) (d) (c)

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =

STATISTICAL LEARNING SYSTEMS

Principal Component Analysis

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Statistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University

Numerical Methods I Singular Value Decomposition

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

CSC 411 Lecture 12: Principal Component Analysis

PCA vignette Principal components analysis with snpstats

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Multivariate Statistics

Computation. For QDA we need to calculate: Lets first consider the case that

Bootstrapping, Randomization, 2B-PLS

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Principal Component Analysis

Principal Components Theory Notes

Deriving Principal Component Analysis (PCA)

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Principal Component Analysis

Eigenvalues, Eigenvectors, and an Intro to PCA

Vector Space Models. wine_spectral.r

Principal Components Analysis

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Principal Components Analysis using R Francis Huang / November 2, 2016

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Principal component analysis

Discriminant Analysis (DA)

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex

The SAS System 18:28 Saturday, March 10, Plot of Canonical Variables Identified by Cluster

Multivariate Statistical Analysis

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

CS 340 Lec. 6: Linear Dimensionality Reduction

Lecture: Face Recognition and Feature Reduction

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Principal Component Analysis, A Powerful Scoring Technique

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Linear Algebra in a Nutshell: PCA. can be seen as a dimensionality reduction technique Baroni & Evert. Baroni & Evert

Transcription:

Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n }

Recall: Direction of Maximal Variance The first eigenvector of a covariance matrix points in the direction of maximum variance in the data. This eigenvector is the first principal component. weight height

Recall: Direction of Maximal Variance The first eigenvector of a covariance matrix points in the direction of maximum variance in the data. This eigenvector is the first principal component. v1 w h

Recall: Secondary Directions The second eigenvector of a covariance matrix points in the direction, orthogonal to the first, that has the maximum variance. v2 w h

Eigenvalues The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction? v1 w h

The corresponding eigenvalues, λ 1 and λ 2, tell us the amount of variance in each direction. What do we mean by variance in that direction? v2 w h

A New Basis Principal components provide us with a new orthogonal basis where the new coordinates are uncorrelated:

Total Variance The total amount of variance is defined as the sum of the variances of each variable. TotalVariation = Var(height) + Var(weight) In our new principal component basis, the total amount of variance has not changed: TotalVariation = λ 1 + λ 2 The proportion of the variance directed along (explained by) the first component would then be: λ 1 λ 1 + λ 2 Likewise, for the second component: λ 2 λ 1 + λ 2

Total Variance Another way to state this fact is to use the theorem from Linear Algebra that says for any square matrix A, Trace(A) = Since A in our situation is the covariance matrix, the Trace(A) is the sum of the variances of each variable. n i=1 λ i

Total Variance No matter how many components we have, the proportion of variance explained by each component is the corresponding eigenvalue divided by the sum of the eigenvalues (or the total variance): Proportion Explained by Component i = λ i n j=1 λ j The proportion of variance explained by the first k components is then the sum of the first k eigenvalues divided by the total variance: Proportion Explained by First k Components = k i=1 λ i n j=1 λ j

Let s Practice 1 Suppose we have a dataset with 8 variables and we use standardized data (i.e. correlation PCA). What is the total amount of variance in our data? 2 Suppose I have a dataset with 3 variables and the eigenvalues of the covariance matrix are λ 1 = 3, λ 2 = 2, λ 3 = 1. a. What proportion of variance is explained by the first principal component? b. What is the variance of the second principal component? c. What proportion of variance is captured by using both the first and second principal components?

Zero Eigenvalues What would it mean if λ 2 = 0? Variance along that direction is exactly zero. All data points fall in the same exact spot. Height and Weight must be perfectly correlated. w h Data is essentially one-dimensional. v 1 alone explains 100% of the variation in height and weight.

Small Eigenvalues When eigenvalues are close to zero... Not much variance in this direction Won t lose much by ignoring or dropping this component Dropping Components = Orthogonal Projection......onto a subspace......the span of the principal components...or eigenvectors #DimensionReduction

Screeplot Plot of the eigenvalues. Sometimes used to guess the number of latent components by finding an elbow in the curve.

Coordinates in the new basis To make the plot shown below, we need to know the coordinates of the data in the new basis. So for each observation, we need to solve: Observation i = α 1i v 1 + α 2i v 2 observation i

Coordinates in the new basis To find the coordinates in the new basis, we simply use the formulas for each Principal Component: ( ) 0.7 PC 1 = v 1 = = 0.7h + 0.7w 0.7 ( ) 0.7 PC 2 = v 2 = = 0.7h + 0.7w 0.7 Thus, the new coordinates (called scores) are found in the matrix S: S = XV height weight obs 1 obs 2 ( PC 1 PC 2 ) h =. w obs n Where X contains centered (cov) or standardized (cor) data.

Summary of Output 3 major pieces of output: Eigenvectors (Principal Components / Variable Loadings ) Sometimes called rotation matrix Eigenvalues (Variances) Coordinates of Data in new basis Output dataset in SAS (out=...) Often called scores or scored data

Correlation matrix vs. Covariance matrix PCA can be done using eigenvectors of either the covariance matrix or the correlation matrix. Default in SAS is correlation Default in R is covariance (at least for most packages - always check!!)

Correlation matrix vs. Covariance matrix Covariance PCA: Data is centered (essentially unchanged) and directions of maximal variance are drawn. Use when scales of variables are not very different. Correlation PCA: Data is centered and normalized/standardized before directions of maximal variance are drawn. Use when scales of variables are very different.

Results may differ by a sign or constant factor!! SAS code: proc princomp data=iris out=irispc; var sepal_length sepal_width petal_length petal_width; run; Covariance option: proc princomp data=iris out=irispc cov; var sepal_length sepal_width petal_length petal_width; run;

SAS s strange problem To the best of my knowledge, SAS will not perform a typical PCA for datasets with fewer observations than variables. For our examples, we will use R.