PRINCIPAL COMPONENT ANALYSIS

Similar documents
Principal Components Analysis (PCA)

PRINCIPAL COMPONENTS ANALYSIS

Principal Component Analysis

Introduction to Machine Learning

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

7. Variable extraction and dimensionality reduction

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Machine Learning (Spring 2012) Principal Component Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Covariance and Correlation Matrix

PCA, Kernel PCA, ICA

Unsupervised Learning: Dimensionality Reduction

Dimensionality Reduction

What is Principal Component Analysis?

Lecture: Face Recognition and Feature Reduction

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Lecture: Face Recognition and Feature Reduction

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning 2nd Edition

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Covariance and Principal Components

GEOG 4110/5100 Advanced Remote Sensing Lecture 15

Lecture 7: Con3nuous Latent Variable Models

Eigenvalues, Eigenvectors, and an Intro to PCA

Data Preprocessing Tasks

PCA and admixture models

1 Principal Components Analysis

Mathematical foundations - linear algebra

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Principal Component Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Principal Components Theory Notes

Maximum variance formulation

Linear Subspace Models

Deriving Principal Component Analysis (PCA)

Neuroscience Introduction

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principal Component Analysis (PCA)

Machine Learning (BSMC-GA 4439) Wenke Liu

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Correlation Preserving Unsupervised Discretization. Outline

Singular Value Decomposition and Principal Component Analysis (PCA) I

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Principal Component Analysis

On Improving the k-means Algorithm to Classify Unclassified Patterns

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

The Principal Component Analysis

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Principal Component Analysis

STA141C: Big Data & High Performance Statistical Computing

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

The recognition of substantia nigra in brain stem ultrasound images based on Principal Component Analysis

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi. University of Virginia

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Independent Component Analysis and Its Application on Accelerator Physics

Principal Component Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

CSC 411 Lecture 12: Principal Component Analysis

STA 414/2104: Lecture 8

PCA FACE RECOGNITION

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Linear Dimensionality Reduction

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

PRINCIPAL COMPONENT ANALYSIS

Dimensionality Reduction

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Machine Learning - MT & 14. PCA and MDS

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Notes on Latent Semantic Analysis

FSAN/ELEG815: Statistical Learning

Feature Extraction Techniques

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Statistical Pattern Recognition

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Principal component analysis (PCA) for clustering gene expression data

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Principal Component Analysis

Dimensionality Reduction and Principle Components Analysis

Econ Slides from Lecture 7

STA 414/2104: Machine Learning

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

L11: Pattern recognition principles

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Computation. For QDA we need to calculate: Lets first consider the case that

Transcription:

PRINCIPAL COMPONENT ANALYSIS Dimensionality Reduction Tzompanaki Katerina

Dimensionality Reduction Unsupervised learning Goal: Find hidden patterns in the data. Used for Visualization Data compression Data Preprocessing step for supervised algorithms Reduce the number of considered data dimensions Feature selection: Keep a subset of the existing dimensions Feature extraction: Identify combinations of the features that can be used to replace the existing features. Typically, they are more informative and better suited for analysis, in high dimensional spaces. 06/04/2018 2

Dimensionality Reduction Unsupervised learning Used for Data Preprocessing for supervised algorithms Visualization Data compression Reduce the number of considered data dimensions Feature selection: Keep a subset of the existing dimensions Feature extraction: Engineer combinations of the features that can be used to replace the existing ones. Typically, they are more informative and better suited for analysis, in high dimensional spaces. Feature extraction: Principal Component Analysis algorithm 06/04/2018 3

Principal Component Analysis Question: Are our data features appropriate for analyzing our data? Is there a better set of features that we could use? Answer: Investigate how much correlated the original features are and how much the values of each feature are distributed in the value space. Features that are very correlated with each other contain redundancy! Features with a small variation contain low information! Construct a new set of features based on the notions of variance and covariance of the original features. 06/04/2018 4

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) Here, our data are described by two features: area and circumference, which are obviously correlated! circumference (x) 06/04/2018 5

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) Here, our data are described by two features: area and circumference, which are obviously correlated! What truly matters is the radius of the circle! circumference (x) 06/04/2018 6

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) So, how can we obtain a new feature that is truly informative of our data? circumference (x) 06/04/2018 7

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) Find directions of maximal variance! circumference (x) 06/04/2018 8

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) circumference (x) 06/04/2018 9

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) Variation of the data points for the red dimension circumference (x) 06/04/2018 10

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) Find directions of maximal variance! Find directions that are mutually orthogonal. circumference (x) 06/04/2018 11

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) Variation of the data points for the green dimension circumference (x) 06/04/2018 12

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) The red and the green directions are the principal components, the new dimensions for the data! circumference (x) 06/04/2018 13

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) As the data vary (a lot) more on the red direction, this is the first principal component. The green one, is the second principal component. circumference (x) 06/04/2018 14

PCA: Motivating Example Circle area=πr 2 circumference=2πr area (y) So, let s transform the data space by rotating the axes. circumference (x) 06/04/2018 15

PCA: Motivating Example Circle area=πr 2 circumference=2πr PC2 On the new dimensions, we can easily see that the most of the information is carried on PC1, and that PC2 can be safely dropped. PC1 06/04/2018 16

PCA: Motivating Example Circle area=πr 2 circumference=2πr PC1 This means that the data points will be projected on PC1. Finally, we got the one dimension that we were intuitively seeking in the beginnning. 06/04/2018 17

Principal Component Analysis Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The principal components analysis finds directions of maximal variance. All principle components are orthogonal to one another. If there are N features in the original data, there are N possible principal components. These are ordered based on their eigenvalue (capturing their variance). The principal components with the lowest variance (eigenvalue) are not informative and may be discarded to obtain M<N features (feature selection). 06/04/2018 18

PCA: Algorithm 1. Make the input data zero-mean, i.e., substract the mean from each observation of each feature. (Some times we will have to also make them unit-variance by scaling). 2. Compute the covariance matrix for the zero-mean data. 3. Compute the eigenvectors and eigenvalues for the covariance matrix. The eigenvectors are the principal components; the first principal component has the highest eigenvalue, the second principal component the second highest, etc. 4. Build the new feature vector and perform feature selection if necessary If the eigenvalues of some principal components is zero, the feature selection is done automatically 5. Transform the data to the new feature system. 06/04/2018 19

Eigenvectors & eigenvalues Some background on matrices: Let A be a square matrix and C be a multiplication-compatible vector to A. Then C is said to be an eigenvector, if there exists a (real or complex) number λs.t. AC=λC Then, λis called the eigenvalue for C. For A(n,n) there exist n eigenvectors that are orthogonal. Scaled eigenvectors are still eigenvectors. What changes is the length of the vector, not its direction. We will be looking for unit-length eigenvectors. It is not easy to compute the eigenvectors of a matrix, when n>3. We will resort to mathematic libraries. 06/04/2018 20

Eigenvectors & eigenvalues For example if A= and C= then C is an eigenvector because 2 3 2 1 3 2 2 3 AC = = 4x 2 1 3 2 3 2 The eigenvalue is 4. A scaled vector, eg. is still an eigenvector with eigenvalue 4. 4 To find the respective unit-length vector, we divide by the length: Thus: C unit = 3/ 2 / 13 13 6 C = 3 2 + 2 2 = 13 06/04/2018 21

PCA Step 1: zero-mean Original data x y 2.5 2.4 0.5 0.7 2.2 2.9 1.9 2.2 3.1 3.0 2.3 2.7 2 1.6 1 1.1 1.5 1.6 1.1 0.9 Substract the mean Zero-mean data x y 0.69 0.49-1.31-1.21 0.39 0.99 0.09 0.29 1.29 1.09 0.49 0.79 0.19-0.31-0.81-0.81-0.31-0.31-0.71-1.01 06/04/2018 22

PCA Step 1: zero-mean 06/04/2018 23

PCA Step 2: covariance matrix The covariance matrix for two variables is: x y x σ 2 (x) cov(x,y) y cov(y,x) σ 2 (y) The (co)variance for zero-mean data is given by: cov(x, y) = 1 N 1 xyt and σ 2 (x) = 1 N 1 xxt For our example we find (verify the result): x y x 0.616555556 0.615444444 y 0.615444444 0.716555556 Since the covariance is positive, we expect the variables to increase together. 06/04/2018 24

PCA Step 3: eigenvectors & eigenvalues For our running example, we can find: eigenvalues 0.490833989 1.28402771 eigenvectors -0.735178656-0.677873399 0.677873399-0.735178656 Eigenvectors are the principal components. The eigenvalues designate the first and second principal components. Here, we can see that the first principal component is far more informative than the second one, because its eigenvalue is much higher. 06/04/2018 25

PCA Step 3: eigenvectors & eigenvalues 06/04/2018 26

PCA Step 4: New feature vector New Feature vector = (eigv1 eigv2 eigvn) (a matrix of column vectors) New Feature vector 1-0.677873399-0.735178656-0.735178656 0.677873399 If we keep both principal components. New Feature vector 2-0.677873399 If we discard the second principal component. -0.735178656 06/04/2018 27

PCA Step 5: New dataset To transform the data to the new features system we apply the following formula: TransformedData=RowNewFeatureVector x RowZeroMeanData where RowNewFeatureVector: Each row represents an eigenvector. Dimensions: (#eigenvectors,#originalfeatures). RowZeroMeanData: Each row is a feature and each column is a (zero-mean) data item. Dimensions: (#originalfeatures,#dataitems). So, TransformedData is a (#eigenvectors, #dataitems) matrix. It is thus evident that the data have been transformed to the new feature system that is a linear combination of the original one. 06/04/2018 28

PCA Step 5: New dataset For our example 06/04/2018 29

What s next? The new data and features are used as input to a subsequent supervised learning algorithm. If it used for compression, we may want to get the original data back Lossless: If we had kept all the principal components Lossy: If we kept only a number of the principal components. 06/04/2018 30

Getting the original data back Apply the formula OriginalData=NewFeatureVector x TransformedData + OriginalMean* For our example and for transformed data using one eigenvector we have the reconstructed data shown here: *Note that if we had also normalized to unit-variance we should revert also this normalization. 06/04/2018 31

Sources Lindsay I Smith: A tutorial on Principal Components Analysis 06/04/2018 32