Dimensionality Reduction

Similar documents
Principal Component Analysis

PCA, Kernel PCA, ICA

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Data Mining Techniques

PCA and admixture models

Introduction to Machine Learning

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA)

Dimensionality Reduction

Deriving Principal Component Analysis (PCA)

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Linear Dimensionality Reduction

STA 414/2104: Lecture 8

Machine Learning - MT & 14. PCA and MDS

Principal Components Analysis (PCA)

Machine Learning (Spring 2012) Principal Component Analysis

What is Principal Component Analysis?

15 Singular Value Decomposition

Dimensionality Reduction and Principle Components Analysis

STA 414/2104: Lecture 8

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

14 Singular Value Decomposition

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Neuroscience Introduction

Lecture 7: Con3nuous Latent Variable Models

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Machine Learning (BSMC-GA 4439) Wenke Liu

CS 340 Lec. 6: Linear Dimensionality Reduction

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Principal Component Analysis

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Principal Component Analysis

PRINCIPAL COMPONENT ANALYSIS

Multivariate Statistical Analysis

STATISTICAL LEARNING SYSTEMS

STA 414/2104: Machine Learning

Advanced Introduction to Machine Learning CMU-10715

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Machine Learning 2nd Edition

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

MLCC 2015 Dimensionality Reduction and PCA

Lecture VIII Dim. Reduction (I)

7 Principal Component Analysis

Gopalkrishna Veni. Project 4 (Active Shape Models)

Machine learning for pervasive systems Classification in high-dimensional spaces

Notes on Latent Semantic Analysis

Data Mining Techniques

PRINCIPAL COMPONENTS ANALYSIS

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

CSC 411 Lecture 12: Principal Component Analysis

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Principal components analysis COMS 4771

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Expectation Maximization

Preprocessing & dimensionality reduction

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Unsupervised dimensionality reduction

Maximum variance formulation

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Dimensionality Reduc1on

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Kernel Methods. Barnabás Póczos

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

Statistical Machine Learning

Lecture 8: Principal Component Analysis; Kernel PCA

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Dimensionality Reduction with Principal Component Analysis

Machine Learning Techniques

Data Preprocessing Tasks

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Unsupervised Learning: Dimensionality Reduction

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture: Face Recognition and Feature Reduction

Covariance and Principal Components

Numerical Methods I Singular Value Decomposition

Principal Component Analysis (PCA)

Data Mining and Analysis: Fundamental Concepts and Algorithms

Statistical Pattern Recognition

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

LECTURE 16: PCA AND SVD

Probabilistic Latent Semantic Analysis

Linear Subspace Models

Data Exploration and Unsupervised Learning with Clustering

Lecture: Face Recognition and Feature Reduction

Principal Component Analysis and Linear Discriminant Analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Table of Contents. Multivariate methods. Introduction II. Introduction I

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Announcements (repeat) Principal Components Analysis

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Principal Component Analysis

Transcription:

Lecture 5 1

Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2

Example 1: Sportsball Consider watching a sportsball game in 4K UHD (3840 2160) Assume fixed camera How many pixels? >8M If all we care about is the location of the ball, how many dimensions of interest? 3: (x, y, z) 3

Example 2: Image Embedding Take a small, known picture (e.g. ), embed it within a large white space (100x100) How many pixels? 10K How many degrees of freedom? 3 (h_transform, v_transform, rotation) 4

The removal of attributes thought to be irrelevant for the task at hand Many approaches We focus on unsupervised, linear 5

Why Reduce Dimensionality? Noise reduction Imputing (missing) data Estimate values for small d, reconstruct approximately (e.g. recommender systems!) Visualize data/results Better for hu-mahns! Think back to the effect of high-d in clustering! Reduce computational cost of other processes Compression, clustering, classification, 6

Principal Component Analysis (PCA) An approach that learns rotation axes from input data Axes optimize two equivalent objectives Maximize projected variance Minimize reconstruction error 7

Example Consider the following data 8

Intuition: Projection Variance Sample Variance ~ 0.6 y x Sample Variance ~ 9.5 9

Intuition: Reconstruction Error SSE ~ 85.4 y x SSE ~ 5.2 10

Minimization in Action 11

Intuition: Equivalent Objectives Variance of Data = Captured Variance + Reconstruction Error (Fixed) (Want Large!) (Want Small!) 12

PCA: Big Picture Idea Find an axis (i.e. basis vector) that explains the most variance in the data Now find a second axis that is orthogonal to the first, and, given this constraint, explains the most variance Now find a third axis Keep those that explain sufficient variation 13

Example 2D Any Information Lost? 14

Example 17D (setosa.io) 15

Deriving PCA via Maximizing Variance Big picture 1. Express variance of projected data 2. Maximize, solving for projection vector P.S. Expect abused notation :) 16

Dataset: X = { x n } Some Definitions n = 1 N Euclidean, dimensionality D Goal: find axis of most variance Direction: u Euclidean, dimensionality D Assume unit length: u T u = 1 (More generally U = [ u 1 u 2 ]) 17

Variance of Projected Data Var[ Xu ] = E[ ( Xu E[ Xu ] ) 2 ] = E[ ( Xu E[X]u ) 2 ] = E[ ( ( X E[X] )u ) 2 ] = u E[ ( X E[X] ) 2 ] u = u E[ ( X E[X] )( X E[X] ) T ] u = u T S u Can we maximize directly? Look familiar? Covariance! 18

Maximize, with u T u=1 @ @u u Su + (1 u u)=0 (S + S )u 2 u =0 Su = u Symmetric! Familiar? So u=eigenvector with largest λ 19

So The projection vector that captures maximum variance is the eigenvector of the covariance matrix that has the largest eigenvalue This is the first principle component (PC1) Because the covariance matrix is symmetric, all of the eigenvectors are pairwise orthogonal Prove this in HW! 20

Applying PCA 1. Transform data For each dimension: mean=0, variance=1 x = ( x - μ ) / σ 2. Compute eigendecomposition of the covariance matrix 3. Select k principal components 4. Project data (Xu) 21

Computing the Eigendecomposition Values Symbolically via characteristic equation Power iteration Vectors System of linear equations w.r.t. characteristic equation or def n of eigendecomposition See LRU for examples Practice in HW! Note issues related to speed vs accuracy 22

Explaining Variance S is positive semi-definite, so eigenvalues are non-negative Eigenvalues represent proportion of variance explained (0 = ignore!) SO, sort descending, apply threshold for cumulative proportion of eigenvalues Often 90% Or elbow, constant (e.g. visualization) 23

Wheat Data Top 2 Components Bottom 2 Components 24

Face Image Dataset 25

Singular Value Decomposition (SVD) X = USV T X T X = VS T U T USVT = VSISV T = VSSV T = VDV T A more general decomposition Basis vectors for rows and columns of data vs PCA: only rows Many other uses (e.g. Latent Semantic Analysis) SVD provides the same decomposition when mean of the data is 0 (S)ingular values = sqrt(eigenvalues) Project via XV k Practice in HW! 26

SVD ~ 27

PCA vs SVD Can calculate SVD without X T X Saves computation Prevents loss of precision Actual performance depends upon N vs D Eigen: O(nd 2 + d 3 ) SVD: O(min{nd 2, n 2 d}) But SVD tends to be the default 28

Linearity Core Assumptions Can be addressed via kernel trick Focus on orthogonality, variance Other approaches exist, may be more appropriate 29

Independent Components Analysis (ICA) Whereas PCA seeks correlation via variance, ICA seeks statistically independent components Like PCA, achieved via linear transformation 30

ICA Assumption Find Hidden via Observable Hidden Ind. Variables Observable Variables 31

ICA Example: Cocktail Party 32

Autoencoders 33

Autoencoders Also known as an autoassociative neural network Architecture Same number of inputs/outputs Fewer code nodes than i/o Goal: reproduce output If only one hidden layer (i.e. code), same as PCA But if more performs nonlinear PCA :) 34

More Than You Wanted to Know 35