Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Similar documents
1 Principal Components Analysis

MLCC 2015 Dimensionality Reduction and PCA

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Linear Algebra Review. Fei-Fei Li

Machine Learning - MT & 14. PCA and MDS

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Principal Component Analysis (PCA)

Dimensionality Reduction AShortTutorial

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Principal Component Analysis

Singular Value Decomposition

UNIT 6: The singular value decomposition.

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

1 Feature Vectors and Time Series

Principal Component Analysis

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Problem # Max points possible Actual score Total 120

Chapter 3 Transformations

Linear Algebra Review. Fei-Fei Li

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Introduction to Machine Learning

Lecture: Face Recognition and Feature Reduction

Linear Algebra. Session 12

Convergence of Eigenspaces in Kernel Principal Component Analysis

Maths for Signals and Systems Linear Algebra in Engineering

2. Review of Linear Algebra

Lecture 5 Singular value decomposition

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition

Numerical Methods I Singular Value Decomposition

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Probabilistic Latent Semantic Analysis

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

Summary of Week 9 B = then A A =

Basic Calculus Review

Lecture: Face Recognition and Feature Reduction

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

Novelty Detection. Cate Welch. May 14, 2015

Tutorial on Principal Component Analysis

1 Linearity and Linear Systems

Kernel Principal Component Analysis

Singular Value Decomposition

STA141C: Big Data & High Performance Statistical Computing

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

The Singular Value Decomposition

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Singular Value Decomposition

Collaborative Filtering: A Machine Learning Perspective

LECTURE NOTE #11 PROF. ALAN YUILLE

Preprocessing & dimensionality reduction

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

. = V c = V [x]v (5.1) c 1. c k

1. The Polar Decomposition

IV. Matrix Approximation using Least-Squares

Lecture 2: Linear Algebra Review

B553 Lecture 5: Matrix Algebra Review

PCA, Kernel PCA, ICA

Linear Algebra Review. Vectors

Linear Algebra - Part II

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Support Vector Machine

Properties of Matrices and Operations on Matrices

Intertibility and spectrum of the multiplication operator on the space of square-summable sequences

Stat 159/259: Linear Algebra Notes

4 Bias-Variance for Ridge Regression (24 points)

MIT Final Exam Solutions, Spring 2017

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

Concentration Ellipsoids

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

5 Linear Algebra and Inverse Problem

Notes on Eigenvalues, Singular Values and QR

Linear Algebra (Review) Volker Tresp 2017

Singular Value Decomposition

Large Scale Data Analysis Using Deep Learning

The Numerical Stability of Kernel Methods

Data Mining Techniques

Functional Analysis Review

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Dimensionality Reduction and Principle Components Analysis

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Jordan Normal Form and Singular Decomposition

Eigenvalues and diagonalization

Applied Linear Algebra in Geoscience Using MATLAB

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Linear Algebra (Review) Volker Tresp 2018

Lecture 02 Linear Algebra Basics

CS 143 Linear Algebra Review

Spring 2019 Exam 2 3/27/19 Time Limit: / Problem Points Score. Total: 280

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

15 Singular Value Decomposition

Computational Methods. Eigenvalues and Singular Values

Transcription:

Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components algorithm entirely in terms of dot products between data points and limit the direct dependence on the original dimensionality n. This fact will become important below. Assume that the dimensionality n of the n t matrix of data X is large (i.e., n >> t). In this case, Algorithm 1 (Table??) is impractical. We would prefer a run time that depends only on the number of training examples t, or that at least has a reduced dependence on n. Note that in the SVD factorization X = UΣV T, the eigenvectors in U corresponding to nonzero singular values in Σ (square roots of eigenvalues) are in a one-to-one correspondence with the eigenvectors in V. Now assume that we perform dimensionality reduction on U and keep only the first d eigenvectors, corresponding to the top d nonzero singular values in Σ. These eigenvectors will still be in a one-to-one correspondence with the first d eigenvectors in V : X V = U Σ 1

where the dimensions of these matrices are: X U Σ V n t n d d d t d diagonal Crucially, Σ is now square and invertible, because its diagonal has nonzero entries. Thus, the following conversion between the top d eigenvectors can be derived: U = X V Σ 1 (1) Replacing all uses of U in Algorithm 1 with XV Σ 1 gives us the dual form of PCA, Algorithm 2 (see Table 1). Note that in Algorithm 2 (Table 1), the steps of Reconstruct training data and Reconstruction test example still depend on n, and therefore still will be impractical in the case that the original dimensionality n is very large. However all other steps can be done conveniently in the run time that depends only on the number of training examples t. 2

Algorithm 2 Recover basis: Calculate X X and let V = eigenvectors of X X corresponding to the top d eigenvalues. Let Σ = diagonal matrix of square roots of the top d eigenvalues. Encode training data: Y = U X = ΣV where Y is a d t matrix of encodings of the original data. Reconstruct training data: ˆX = UY = UΣV = XV Σ 1 ΣV = XV V. Encode test example: y = U x = Σ 1 V X x = Σ 1 V X x where y is a d dimensional encoding of x. Reconstruct test example: ˆx = Uy = UU x = XV Σ 2 V X x = XV Σ 2 V X x. Table 1: Dual PCA Algorithm 2 Kernel PCA Consider a feature space H such that: Φ : x H x Φ(x) 3

In kernel PCA we map from the original space X into a feature space H and then into the lower dimensional or embeded space Y. X H Y The goal is to reduce the dimensionality from the space X to the dimensionality of space Y by passing through H without having to know Φ(x) exactly. Suppose t i Φ(x i) = 0 (we will return to this point below, and show how this condition can be satisfied in a Hilbert space). This allows us to formulate the kernel PCA objective as follows: min t Φ(x i ) U q Uq T Φ(x i ) By the same argument used for PCA, the solution can be found by SVD: i Φ(X) = UΣV T where U contains the eigenvectors of Φ(X)Φ(X) T. Note that if Φ(X) is n t and the dimensionality of the feature space n is large, then U is n n which will make PCA impractical. To reduce the dependence on n, first assume that we have a kernel K(, ) that allows us to compute K(x, y) = Φ(x) Φ(y). Given such a function, we can then compute the matrix Φ(X) Φ(X) = K efficiently, without computing Φ(X) explicitly. Crucially, K is t t here and does not depend on n. Therefore it can be computed in a run time that depends only on t. Also, note that PCA can be formulated entirely in terms of dot products between data points (Algorithm 2 represented in Table 1). Replacing dot products in Algorithm 2 ( 1) by kernel function K, which is in fact equivalent to the inner product of a Hilbert space yields to the Kernel PCA algorithm. 4

The algorithm for Kernel PCA is similar to the one for Dual PCA except that in the case of Kernel PCA neither the training data nor the test data can be reconstructed. Algorithm 3 Recover basis: Calculate K = Φ(X) Φ(X) using the kernel K and let V = eigenvectors of Φ(X) Φ(X) corresponding to the top d eigenvalues. Let Σ = diagonal matrix of square roots of the top d eigenvalues. Encode training data: Y = U Φ(X) = ΣV where Y is a d t matrix of encodings of the original data. This can be done since the calculation of Y does not depend on Φ(X) Reconstruct training data: ˆX = UY = UΣV = Φ(X)V Σ 1 ΣV = Φ(X)V V. Φ(X) is unknown so reconstruction can not be done. Encode test example: y = U Φ(x) = Σ 1 V Φ(X) Φ(x) = Σ 1 V Φ(X) Φ(x). This result can be calcuated because the quantity Φ(X) Φ(x) = K(X, x). Reconstruct test example: ˆx = Uy = UU Φ(x) = Φ(X)V Σ 2 V Φ(X) Φ(x) = Φ(X)V Σ 2 V Φ(X) Φ(x). Φ(X) is unknown so reconstruction can not be done. Table 2: Kernel PCA Algorithm 5