Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Similar documents
PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Introduction to Machine Learning

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Machine Learning (BSMC-GA 4439) Wenke Liu

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

HST.582J/6.555J/16.456J

Independent Component Analysis

An Introduction to Independent Components Analysis (ICA)

Independent Components Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

LECTURE NOTE #11 PROF. ALAN YUILLE

CIFAR Lectures: Non-Gaussian statistics and natural images

Independent Component Analysis. Contents

CSC 411 Lecture 12: Principal Component Analysis

Principal Component Analysis

Dimension Reduction (PCA, ICA, CCA, FLD,

Lecture 7: Con3nuous Latent Variable Models

CS281 Section 4: Factor Analysis and PCA

PCA, Kernel PCA, ICA

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Dimensionality Reduction and Principal Components

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Dimensionality Reduction

Dimensionality Reduction and Principle Components

Principal Component Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Independent Component Analysis

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

16.584: Random Vectors

Donghoh Kim & Se-Kang Kim

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Unsupervised learning: beyond simple clustering and PCA

Independent Component Analysis

Math 3191 Applied Linear Algebra

Nonlinear Dimensionality Reduction

Lecture 10: Dimension Reduction Techniques

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Unsupervised Learning: Dimensionality Reduction

DIMENSIONALITY REDUCTION METHODS IN INDEPENDENT SUBSPACE ANALYSIS FOR SIGNAL DETECTION. Mijail Guillemard, Armin Iske, Sara Krause-Solberg

Independent Component Analysis

Joint Factor Analysis for Speaker Verification

Statistical Pattern Recognition

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Independent Component Analysis of Rock Magnetic Measurements

Lecture: Face Recognition and Feature Reduction

Machine Learning for Signal Processing. Analysis. Class Nov Instructor: Bhiksha Raj. 8 Nov /18797

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Computational functional genomics

Fisher s Linear Discriminant Analysis

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Kernel methods for comparing distributions, measuring dependence

Independent component analysis: algorithms and applications

Independent Component Analysis (ICA)

Maximum variance formulation

Announcements (repeat) Principal Components Analysis

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

(Multivariate) Gaussian (Normal) Probability Densities

Probabilistic Graphical Models

Lecture Notes 1: Vector spaces

1 Principal Components Analysis

Mathematical foundations - linear algebra

Independent Component Analysis of Incomplete Data

Non-linear Dimensionality Reduction

Data Mining Techniques

Linear Factor Models. Sargur N. Srihari

Apprentissage non supervisée

Independent Component Analysis. PhD Seminar Jörgen Ungh

Dimensionality Reduc1on

Table of Contents. Multivariate methods. Introduction II. Introduction I

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Advanced Introduction to Machine Learning CMU-10715

2. Matrix Algebra and Random Vectors

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Unsupervised Learning

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Independent Component Analysis

20 Unsupervised Learning and Principal Components Analysis (PCA)

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

STA 414/2104: Machine Learning

PRINCIPAL COMPONENTS ANALYSIS

Lecture: Face Recognition and Feature Reduction

Dimension Reduction and Low-dimensional Embedding

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Background Mathematics (2/2) 1. David Barber

Vectors and Matrices Statistics with Vectors and Matrices

12.2 Dimensionality Reduction

Transcription:

Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro

Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces } Understand how to model data 2015 Bruno Ribeiro

A Geometric Embedding Problem } Ever wondered why flights from U.S. to Europe, India, China, and Middle East seem to take the longest route? E.g. Figure shows Dubai Los Angeles This is the shortest path. We were looking at the wrong geometric embedding 3

Embedding of High Dimensional Data } Example: Bank Loans 1. Amount Requested 2. Interest Rate Percentage 3. Loan Length in Months 4. Loan Title 5. Loan Purpose 6. Monthly Payment 7. Total Amount Funded 8. Debt-To-Income Ratio Percentage 9. FICO Range (https://en.wikipedia.org/wiki/fico) 10. Status (1 = Paid or 0 = Not Paid) } What is the best low dimensional embedding? 4

Today } Principal Component Analysis (PCA) is a linear projection that maximizes the variance of the projected data The output of PCA is a set of k orthogonal vectors in the original p- dimensional feature space, the k principal components, k p PCA gives us uncorrelated components, which are generally not independent components; for that you need independent component analysis } Independent Component Analysis (ICA) A weaker form of independence is uncorrelatedness. Two random variables y and y are said to be uncorrelated if their covariance is zero But uncorrelatedness does not imply independence (nonlinear dependencies) We would like to learn hidden independence in the data 5

PCA Formulation } Goal Find a linear projection that maximizes the variance of the projected data Given a p-dimensional observed r.v., x, find U and z such that x = Uz and z has uncorrelated components z i } Due to non-uniqueness, U is assumed unitary } Best practices (Standardization = Centering + Scaling): } Centered PCA: The variables x are first centered should they have 0 mean } Scaled PCA: Each variable is scaled to have unit variance (divide column by standard deviation) 6

PCA Properties } Principal Component Analysis (PCA) is a linear projection that maximizes the variance of the projected data The output of PCA is a set of k orthogonal vectors in the original p- dimensional feature space, the k principal components, k p Principal components are weighted combinations of the original features The projections onto the principal components are uncorrelated No other projection onto k dimensions captures more of the variance The first principal component is the direction in the feature space along which the data has the most variance The second principal component is the direction orthogonal to the first component with the most variance 7

PCA Algorithm } Let X be a N by p matrix with N measurements of dimension p } A=X X T } Let A = PΛP T, where L is the eigenvalue diagonal matrix and P is the eigenvector matrix } PCA projection = P X } Standard deviation of component j std j = r 1 p (P AP T ) j 8

How Many Components to Use? } Find steep drop in standard deviation (std j ) std j Variances 0e+00 2e+09 4e+09 6e+09 1 2 3 4 5 6 7 8 j (PCA component) 9

Independent Component Analysis (ICA) Formulation } Goal: Given a p-dimensional observed r.v., x, find H and s such that x = Hs s has mutually statistically independent components s i Known as blind source separation problem 10

Blind Source Separation: The Cocktail Party Problem Mixing matrix H s 1 Sources x 1 Observations x 2 s 2 x = Hs k sources, N=k observations

s i must be statistically independent p(s 1,s 2 ) = p(s 1 )p(s 2 ) ICA not for Gaussian distributions The joint density of unit variance s 1 & s 2 is symmetric. So no information about the directions (col vectors) in mixing matrix H. Thus, H can't be estimated. If only one IC is gaussian, the estimation is still possible.

Original Dimension 2 1 0 1 2 s 1 s 2 Original Dimension 2 1 0 1 2 h i,2 h i,1 x i = h i,1 s 1 +h i,2 s 2 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Original Dimension 1 Original Dimension 1 Find vectors that describe the data set the best. Each point: linear combination of x i = h i,1 s 1 + h i,2 s 2 1 3

ICA Uniqueness } If s has independent components s i, so has ΛPs where Λ is invertible diagonal and P is a permutation } If (A, s) is a solution, then AΛP and P T Λ 1 s are also solutions. Essential uniqueness: unique up to a trivial transformations, i.e. a scale-permutation Whole equivalence class of solutions Just find one representative solution 14

An ICA Algorithm (Example of noise-free ICA) } Observed values x such that x = H s } One way is to minimize mutual information Equivalent to the well known Kullback-Leibler divergence between the joint density of s and the product of its marginal densities } Define distribution family of s i ~ g (assumed known, often tanh) } Let W = H 1 NY } Recall sources are independent p(x) = p s (wi T x) det W N } Given a training set {x (i) ;i = 1,...,N}, the log likelihood of W is given by 0 1 NX px log P [X W ]= @ log g 0 (wj T x (i) )+N log det W A i=1 j=1 i=1 (Pham et al. 1992) D.T. Pham, P. Garrat and C. Jutten, Separation of a mixture of independent sources through a maximum likelihood approach, EUSIPC, 1992 15

Original Dimension 2 1 0 1 2 PCA v.s. ICA } Which method (PCA or ICA) provides best projection? Original Coordinates 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Original Dimension 1 PCA PCA orthogonality bad for non-orthogonal data PCA Coordinates ICA: forcing independence but not orthogonality shows true dimension of data ICA Coordinates 0.8 0.6 0.4 0.2 0.0 0.2 0.4 R Code: https://www.cs.purdue.edu/homes/ribeirob/courses/spring2016/extra/pca_vs_ica.r PCA Dimension 2 ICA Dimension 2 2 1 0 1 2 3 1.0 0.5 0.0 0.5 1.0 0.5 0.0 0.5 PCA Dimension 1 ICA Dimension 1 16

Correlation vs Independence } Example 1: Mixture of 2 identically distributed sources } Consider the mixture of two independent sources x1 1 1 s1 = 1 1 x 2 } where E[s 2 i] = 1 and E[s i ] = 0. } Then x i are uncorrelated as E[x 1 x 2 ] E[x 1 ]E[x 2 ]=E[s 2 1] E[s 2 2]=0 But x i are not independent since, say E[x 2 1x 2 2] E[x 2 1]E[x 2 2]=E[s 4 1]+E[s 4 2] 6 6= 0 s 2 17

Original Dimension 2 1 0 1 2 3 PCA v.s. ICA (Round 2) } Which method (PCA or ICA) provides best projection? Original Coordinates 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Original Dimension 1 PCA PCA Dimension 2 0.5 0.0 0.5 PCA Coordinates PCA 3 2 1 0 1 2 PCA Dimension 1 ICA (works better even though data is non-linear) ICA Dimension 2 0.4 0.2 0.0 0.2 0.4 ICA Coordinates 0.4 0.2 0.0 0.2 0.4 0.6 ICA Dimension 1 R Code: https://www.cs.purdue.edu/homes/ribeirob/courses/spring2016/extra/pca_vs_ica_nonlinear.r 18

Sometimes Data is Locally Linear } Rapper B.o.B. is talking about manifolds } Earth s surface is a manifold that is locally flat; linear algebra is all you need if you are not going far 19

ISOMAPS } } } } } Use manifolds to project data into lower dimensional space Data geometry is locally linear Construct adjacency matrix A from data A i,j = 1 if points x i and x j are nearby Find low dimensional embedding of A Any linear algebra tool will do E.g. Spectral decomposition of symmetric normalized Laplacian: D -1/2 A D -1/2 Original Dimension 2 1 0 1 2 3 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Original Dimension 1 20