Principal Component Analysis

Size: px
Start display at page:

Download "Principal Component Analysis"

Transcription

1 CSci 5525: Machine Learning Dec 3, 2008

2 The Main Idea Given a dataset X = {x 1,..., x N }

3 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection

4 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations

5 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized

6 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized The average projection cost is minimized

7 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized The average projection cost is minimized Both are equivalent

8 Two viewpoints u 1 x 2 x n x n x 1

9 Maximum Variance Formulation Consider X = {x 1,..., x N }

10 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d

11 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d

12 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d Each datapoint x i gets projected to u T x i

13 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d Each datapoint x i gets projected to u T x i Mean of the projected data u T 1 x where x = 1 N N x n

14 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d Each datapoint x i gets projected to u T x i Mean of the projected data u T 1 x where x = 1 N Variance of the projected data where 1 N N x n N (u T 1 x n u T 1 x)2 = u T 1 Su 1 S = 1 N N (x n x)(x n x) T

15 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1

16 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1

17 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1

18 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem u T 1 Su 1 + λ 1 (1 u T 1 u 1 )

19 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem First order necessary condition u T 1 Su 1 + λ 1 (1 u T 1 u 1 ) Su 1 = λu 1

20 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem First order necessary condition u T 1 Su 1 + λ 1 (1 u T 1 u 1 ) Su 1 = λu 1 u 1 must be largest eigenvector of S since u T 1 Su 1 = λ 1

21 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem First order necessary condition u T 1 Su 1 + λ 1 (1 u T 1 u 1 ) Su 1 = λu 1 u 1 must be largest eigenvector of S since u T 1 Su 1 = λ 1 The eigenvector u 1 is called a principal component

22 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1

23 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1 Maximize u T 2 Su 2 s.t. u 2 2 = 1, u 2 u 1

24 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1 Maximize u T 2 Su 2 s.t. u 2 2 = 1, u 2 u 1 Turns out to be the second eigenvector, and so on

25 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1 Maximize u T 2 Su 2 s.t. u 2 2 = 1, u 2 u 1 Turns out to be the second eigenvector, and so on The top-m eigenvectors give the best m-dimensional projection

26 Minimum Error Formulation Consider a complete basis {u i } in R d

27 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i

28 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1

29 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d

30 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d A generic representation of a low-d point x n = m d z ni u i + b i u i i=1 i=m+1

31 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d A generic representation of a low-d point x n = m d z ni u i + b i u i i=1 i=m+1 Coefficients z ni depend on the data point x n

32 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d A generic representation of a low-d point x n = m d z ni u i + b i u i i=1 i=m+1 Coefficients z ni depend on the data point x n Free to choose z ni, b i, u i to get x n close to x n

33 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N

34 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m

35 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d

36 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1

37 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1 Lies in the space orthogonal to the principal subspace

38 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1 Lies in the space orthogonal to the principal subspace The distortion measure to be minimized J = 1 N d d (x T n u i x T u i ) 2 = u T i Su i N i=m+1 i=m+1

39 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1 Lies in the space orthogonal to the principal subspace The distortion measure to be minimized J = 1 N d d (x T n u i x T u i ) 2 = u T i Su i N i=m+1 i=m+1 Need orthonormality constraints on u i to prevent u i = 0 solution

40 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1

41 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 )

42 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2

43 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2 In general, the condition is Su i = λ i u i

44 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2 In general, the condition is Su i = λ i u i Given by the eigenvectors corresponding to the smallest (d m) eigenvalues

45 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2 In general, the condition is Su i = λ i u i Given by the eigenvectors corresponding to the smallest (d m) eigenvalues So the principal space u i, i = 1,..., m are the largest eigenvectors

46 Kernel PCA In PCA, the principal components u i are given by Su i = λ i u i where N S = 1 N x n x T n Consider a feature mapping φ(x) Want to implicitly perform PCA in the feature space Assume the features have zero mean n φ(x n) = 0

47 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N

48 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i

49 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i We want to avoid computing C explicitly

50 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i We want to avoid computing C explicitly Note that the eigenvectors satisfy 1 N } φ(x n ) {φ(x n ) T v i = λ i v i N

51 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i We want to avoid computing C explicitly Note that the eigenvectors satisfy 1 N } φ(x n ) {φ(x n ) T v i = λ i v i N Since the inner product is a scaler, we have N v i = a in φ(x n )

52 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i N a in φ(x n )

53 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i Multiplying both sides by φ(x l ) and using K(x n, x m ) = φ(x n ) T φ(x m ), we have 1 N N K(x l, x n ) N a im K(x n, x m ) = λ i m=1 N N a in φ(x n ) a in K(x l, x n )

54 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i Multiplying both sides by φ(x l ) and using K(x n, x m ) = φ(x n ) T φ(x m ), we have 1 N N K(x l, x n ) N a im K(x n, x m ) = λ i m=1 In matrix notation, we have K 2 a i = λ i NKa i N N a in φ(x n ) a in K(x l, x n )

55 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i Multiplying both sides by φ(x l ) and using K(x n, x m ) = φ(x n ) T φ(x m ), we have 1 N N K(x l, x n ) N a im K(x n, x m ) = λ i m=1 In matrix notation, we have K 2 a i = λ i NKa i N N a in φ(x n ) a in K(x l, x n ) Except for eigenvectors with 0 eigenvalues, we can solve Ka i = λ i Na i

56 Kernel PCA (Contd.) Since the original v i are normalized, we have 1 = v T i v i = a T i Ka i = λ i Na T i a i Gives a normalization condition for a i Compute a i by solving the eigenvalue decomposition The projection of a point is given by y i (x) = φ(x i ) T v i = N a in φ(x n ) T φ(x n ) = N a in K(x, x n )

57 Illustration of Kernel PCA (Feature Space) φ 1 v 1 φ 2

58 Illustration of Kernel PCA (Data Space) x 2 x 1

59 Dimensionality of Projection Original x i R d, feature φ(x i ) R D Possibly D >> d so that the number of principal components can be greater than d However, the number of nonzero eigenvalues cannot exceed N The covariance matrix C has rank at most N, even if D >> d Kernel PCA involves eigenvalue decomposition of a N N matrix

60 Kernel PCA: Non-zero Mean The features need not have zero mean Note that the features cannot be explicitly centered The centralized data would be of the form φ(x n ) = φ(x n ) 1 N N φ(x l ) l=1 The corresponding gram matrix K = K 1 N K K1 N + 1 N K1 N Use K in the basic kernel PCA formulation

61 Kernel PCA on Artificial Data

62 Kernel PCA Properties Computes eigenvalue decomposition of N N matrix Standard PCA computes it for d d For large datasets N >> d, Kernel PCA is more expensive Standard PCA gives projection to a low dimensional principal subspace l ˆx n = (x T n u i )u i i=1 Kernel PCA cannot do this φ(x) forms a d-dimensional manifold in R D PCA projection ˆφ of φ(x) need not be in the manifold May not have a pre-image ˆx in the data space

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462 Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

1 Feature Vectors and Time Series

1 Feature Vectors and Time Series PCA, SVD, LSI, and Kernel PCA 1 Feature Vectors and Time Series We now consider a sample x 1,..., x of objects (not necessarily vectors) and a feature map Φ such that for any object x we have that Φ(x)

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Point Distribution Models

Point Distribution Models Point Distribution Models Jan Kybic winter semester 2007 Point distribution models (Cootes et al., 1992) Shape description techniques A family of shapes = mean + eigenvectors (eigenshapes) Shapes described

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

LECTURE 16: PCA AND SVD

LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

1 Inner Product and Orthogonality

1 Inner Product and Orthogonality CSCI 4/Fall 6/Vora/GWU/Orthogonality and Norms Inner Product and Orthogonality Definition : The inner product of two vectors x and y, x x x =.., y =. x n y y... y n is denoted x, y : Note that n x, y =

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Image Analysis. PCA and Eigenfaces

Image Analysis. PCA and Eigenfaces Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Exercise Sheet 1.

Exercise Sheet 1. Exercise Sheet 1 You can download my lecture and exercise sheets at the address http://sami.hust.edu.vn/giang-vien/?name=huynt 1) Let A, B be sets. What does the statement "A is not a subset of B " mean?

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process Worksheet for Lecture Name: Section.4 Gram-Schmidt Process Goal For a subspace W = Span{v,..., v n }, we want to find an orthonormal basis of W. Example Let W = Span{x, x } with x = and x =. Give an orthogonal

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Chapter XII: Data Pre and Post Processing

Chapter XII: Data Pre and Post Processing Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 9: Dimension Reduction/Word2vec Cho-Jui Hsieh UC Davis May 15, 2018 Principal Component Analysis Principal Component Analysis (PCA) Data

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Principal Components Analysis

Principal Components Analysis Principal Components Analysis Santiago Paternain, Aryan Mokhtari and Alejandro Ribeiro March 29, 2018 At this point we have already seen how the Discrete Fourier Transform and the Discrete Cosine Transform

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Lecture 7 Spectral methods

Lecture 7 Spectral methods CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Solutions to Review Problems for Chapter 6 ( ), 7.1

Solutions to Review Problems for Chapter 6 ( ), 7.1 Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels Machine Learning: Basis and Wavelet 32 157 146 204 + + + + + - + - 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 7 22 38 191 17 83 188 211 71 167 194 207 135 46 40-17 18 42 20 44 31 7 13-32 + + - - +

More information

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16 Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,

More information

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives

More information

Learning sets and subspaces: a spectral approach

Learning sets and subspaces: a spectral approach Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of

More information