Dimensionality Reduction

Similar documents
Introduction to Machine Learning

PCA, Kernel PCA, ICA

PCA and admixture models

Principal Component Analysis

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Dimensionality Reduction

15 Singular Value Decomposition

Principal Component Analysis

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning (Spring 2012) Principal Component Analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Principal Component Analysis (PCA)

Mathematical foundations - linear algebra

Kernel methods for comparing distributions, measuring dependence

Expectation Maximization

Advanced Introduction to Machine Learning CMU-10715

7 Principal Component Analysis

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Neuroscience Introduction

CSC 411 Lecture 12: Principal Component Analysis

Linear Dimensionality Reduction

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

PCA and LDA. Man-Wai MAK

Data Preprocessing Tasks

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

14 Singular Value Decomposition

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Announcements (repeat) Principal Components Analysis

Covariance and Correlation Matrix

Principal Component Analysis

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

PCA and LDA. Man-Wai MAK

Machine Learning - MT & 14. PCA and MDS

What is semi-supervised learning?

Linear Algebra Methods for Data Mining

Dimensionality reduction

Basic Calculus Review

Preprocessing & dimensionality reduction

What is Principal Component Analysis?

PRINCIPAL COMPONENTS ANALYSIS

CITS 4402 Computer Vision

Lecture: Face Recognition and Feature Reduction

PCA Review. CS 510 February 25 th, 2013

Deriving Principal Component Analysis (PCA)

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Multivariate Statistical Analysis

Kernel PCA, clustering and canonical correlation analysis

Lecture: Face Recognition and Feature Reduction

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Dimension Reduction and Low-dimensional Embedding

Data Mining Techniques

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

6.867 Machine Learning

Machine Learning Techniques

Eigenvalues and diagonalization

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Background Mathematics (2/2) 1. David Barber

MLCC 2015 Dimensionality Reduction and PCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

CS 6375 Machine Learning

Principal Component Analysis and Linear Discriminant Analysis

Image Analysis. PCA and Eigenfaces

LEC 3: Fisher Discriminant Analysis (FDA)

Principal components analysis COMS 4771

Analysis of Spectral Kernel Design based Semi-supervised Learning

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

STA 414/2104: Lecture 8

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Nonlinear Dimensionality Reduction

Regularized Discriminant Analysis and Reduced-Rank LDA

Lecture 02 Linear Algebra Basics

Notes on Implementation of Component Analysis Techniques

Definition (T -invariant subspace) Example. Example

Linear Algebra - Part II

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Unsupervised Learning: K- Means & PCA

Principal Component Analysis

1 Principal Components Analysis

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

CS 340 Lec. 6: Linear Dimensionality Reduction

Singular Value Decomposition and Digital Image Compression

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

CSE 554 Lecture 7: Alignment

Dimensionality Reduction with Principal Component Analysis

Some Interesting Problems in Pattern Recognition and Image Processing

Principal Component Analysis CS498

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

STA 414/2104: Machine Learning

Unsupervised Learning Basics

Transcription:

Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23

Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of examples is given Explore and understand your data before you create expensive labeled data or build predictive models Extract meaningful patterns or compact representations from raw data according to certain criterion More subjective compared to supervised learning, and harder to evaluate 2

Documents collections What are the relations between data points? 3

Image databases Image Databases What are the relations between data points? 4

Handwritten digits What are the relations between data points? 5

Cartoon characters What are the relations between data points? 6

So what is dimensionality reduction? The process of reducing the number of random variables under consideration One can combine, transform or select variables The dimension-reduced data can be used for Visualizing, exploring and understanding the data Cleaning the data Building simpler model later Issues for dimensionality reduction How to represent objects? (Vector space? Normalization?) What is the criterion for carrying out the reduction process? What are the algorithm steps? 7

Bag of words representation document document 2 Machine learning concerns the construction and study of systems that can learn from data. Representation of data instances and functions evaluated on these instances are part of all machine learning systems 2 learn represent system data instance function 2 vector in R n 8

Pixel representation 9 vector in R n

Images of different sizes color texture composition vector in R n

Objects in real life Family: //2 Sex: / Work place: //2/3 vector in R n

Use what criterion for reduction? There are many criteria (geometric based, information theory based, etc.) Want to capture variation in data variations are signals or information in the data need to normalize each variables first Want to discover variables or dimensions highly correlated or dependent represent highly related phenomena combine them to form a simpler presentation 2

An example Data vary more in this direction Data vary less in this direction Two features are correlated 3

Reduce to dimension 4

Principal component analysis Given m data points, x, x 2, x m μ = m m i= x i R n, with their mean Find a direction w R n where w Such that the variance (or variation) of the data along direction w is maximized max w: w m m i= w x i w μ 2 variance 5

Is it an easy optimization problem? Manipulate the objective with linear algebra m m i= m w x i w μ 2 = m (w (x i μ)) 2 i= m = m w x i μ x i μ w i= m = w x i μ x i μ w m i= covariance matrix C 6

Landscape of the optimization problem Suppose the data has two dimension (n = 2) C is a diagonal matrix C = 2 The optimization problem becomes max w: w w Cw = max w: w (w, w 2 ) 2 w w 2 = max w: w w 2 + 2w 2 2 7

Landscape of the optimization problem f w, w 2 = w 2 + 2w 2 2 3 2.5 2.5.5.5 -.5 - - -.5.5 8

Eigen-value problem Eigen-value problem Given a symmetric matrix C R n n Find a vector w R n and w = Such that Cw = λw There will be multiple solution of w, w 2, with different λ, λ 2, They are ortho-normal: w i w i =, w i w j = 9

Equivalent to eigen-value problem Claim: max w: w w Cw Cw = λw Form lagrangian function of the optimization problem L(w, λ) = w Cw + λ( w 2 ) Necessary condition If w is a maximum of the original optimization problem, then there exists a λ, where (w, λ) is a stationary point of L(w, λ) This implies that L w = = 2Cw 2λw 2

Principal direction of the data w 2

Variance of in the principal direction Principal direction w satisfies Cw = λw Variance in principal direction is w Cw = λw w = λ eigen-value 22

Multiple principal directions Directions w, w 2, which has the largest variances but are orthogonal to each other Take the eigenvectors w, w 2, of C corresponding to the largest eigenvalue λ, the second largest eigenvalue λ 2 23

Solve eigen-value problem Not an easy task in general But eigen-decomposition is implemented in many modern linear algebra libraries Large-scale, parallel, distributed, and iterative implementations also exist For instance, in matlab [W, S] = eig(c) returns all eigen-vector [W, S] = eigs(c, k) returns k eigen-vectors 24

Experiments with handwritten digits 25

Experiments with 2 news groups Bag-of-words, or term-document matrix 26

Singular value decomposition For a matrix X, decompose it as X = USV Singular vector pair (u, v) is related by Xu = sv and X v = su Singular value decomposition is related to eigendecomposition Let C = XX X v = su Cv = λv and λ = s 2 27