Unsupervised dimensionality reduction

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Unsupervised dimensionality reduction"

Transcription

1 Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30

2 Outline 1 PCA 2 Kernel PCA 3 Multidimensional scaling 4 Laplacian Eigenmaps 5 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 2/30

3 PCA Guillaume Obozinski Unsupervised dimensionality reduction 4/30

4 A direction that maximizes the variance Data are points in R d. Looking for a direction v in R d such that the variance of the signals projected on v is maximized: Var((v x i )...n ) = 1 n = 1 n (v x i ) 2 v x i x i v = v ( 1 n = v Σ v x i x i ) v Need to solve max v 2 =1 v Σ v Solution: eigenvector associated to the largest eigenvalue of Σ Guillaume Obozinski Unsupervised dimensionality reduction 5/30

5 Principal directions Assume the design matrix is centered, i.e. X 1 = 0. Principal directions as eigenvectors of the covariance Consider the eigenvalue decomposition of Σ = X X: Σ = VS 2 V with S = Diag(s 1,..., s n ) and s 1... s n. The principal directions are the columns of V Principal directions as singular vectors of the design matrix Consider the singular value decomposition of X X = USV The principal directions are the right singular vectors. Guillaume Obozinski Unsupervised dimensionality reduction 6/30

6 Principal components Obtained by projection of the rows of X on V. But XV = USV V = US. So the principal components are obtained with the left singular vectors and the singular values. Guillaume Obozinski Unsupervised dimensionality reduction 7/30

7 Kernel PCA Guillaume Obozinski Unsupervised dimensionality reduction 9/30

8 Centering implicitly in feature space Assume that we use a mapping φ so that the representation of the data is the design matrix Then φ(x 1 ) Φ =.. φ(x n ) φ = 1 n φ(x i ) = Φ 1. So that if φ(x i ) = φ(x i ) φ then Φ = Φ 11 Φ = ( I n 1 n 11 ) Φ Finally, the center kernel matrix is computed as K = Φ Φ = HKH with H = I n 1 n 11. Guillaume Obozinski Unsupervised dimensionality reduction 10/30

9 Principal function in a RKHS (Schölkopf et al., 1998) Find a function f with f H = 1 that maximizes Equivalently, max f f, h xi 2 H. f (x i ) 2 s.t. f 2 H 1. By the representer theorem, f (x) = n j=1 α jk(x, x j ). So, f (x i ) 2 = = ( ) 2 α j K(x i, x j ) j=1 j,j =1 = α K K α α j α j K(x i, x j )K(x i, x j ) So the problem can be written as max α α K K α s.t. α K α. Guillaume Obozinski Unsupervised dimensionality reduction 11/30

10 Solution of kernel PCA Write K = US 2 U. If β = U α, then the problem is formulated as max β β2 i s 4 i s.t. β 2 i s 2 i 1 This is attained for β = ( 1 s 1, 0,..., 0) and thus α = 1 s 1 u 1. So the first principal function is f (x) = 1 s 1 U i1 K(x i, x) And the kth principal function is f (x) = 1 s k U ik K(x i, x) Guillaume Obozinski Unsupervised dimensionality reduction 12/30

11 Multidimensional scaling Guillaume Obozinski Unsupervised dimensionality reduction 14/30

12 Multidimensional scaling Goal: Given a collection of not necessarily Euclidean distances δ ij between pairs of points indexed by {1,..., n}. Construct a collection of points y i in a Euclidean space such that y i y j 2 δ ij Original formulation: Minimize a function called stress function ( ) 2 min yi y j 2 δ ij Y ij Classical formulation: min Y ( yi y j 2 2 δij 2 ij ) 2 Guillaume Obozinski Unsupervised dimensionality reduction 15/30

13 Centered kernel matrix from a Euclidean distance matrix Lemma If D 2 = ( dij 2 ) 1 i,j n is a matrix of squared Euclidean distances, then K = 1 2 HD 2H with H = I n 1 n 11, is the corresponding centered kernel matrix. Proof: d 2 ij = φ(x i ) φ(x j ) 2 2 = K ii + K jj 2K jj With κ = (K 11,..., K nn ), we have 2K = κ1 + 1κ D 2 K = HKH = 1 2 H ( κ1 + 1κ D 2 ) H. Guillaume Obozinski Unsupervised dimensionality reduction 16/30

14 Classical MDS algorithm Algorithm: 1 Compute K = 1 2 HD 2H. 2 Remove negative eigenvalues from K. 3 Solve kernel PCA on K If D 2 are Euclidean distances, step 2 is unnecessary and it can be shown that this solves the classical MDS problem. Guillaume Obozinski Unsupervised dimensionality reduction 17/30

15 Isomap (Tenenbaum et al., 2000) Algorithm: 1 Compute a k-nn graph on the data 2 Compute geodesic distances on the k-nn graph using the l 2 distance on each edge. 3 Apply classical MDS to the obtained distances Remarks: Isomap assumes that we can rely on the l 2 distance locally will fail if there are too many noise dimensions. geodesic distances can be computed with e.g. the Floyd Warshall algorithm Guillaume Obozinski Unsupervised dimensionality reduction 18/30

16 Laplacian Eigenmaps Guillaume Obozinski Unsupervised dimensionality reduction 20/30

17 Graph Laplacians Assume a similarity matrix W is available on the data, e.g. a kernel matrix such as W = (w ij ) 1 i,j n with ( w ij = exp 1 ) h x i x j 2 2 We can think of W as defining a weighted graph on the data. We say that a function is smooth on the weighted graph if its Laplacian L (f ) := 1 w ij (f (x i ) f (x j )) 2 2 ij is small. We say that a vector f is smooth on the weighted graph if its Laplacian L (f) := 1 w ij (f i f j ) 2 2 is small. ij Guillaume Obozinski Unsupervised dimensionality reduction 21/30

18 Laplacian and normalized Laplacian matrices Define D = Diag(d) with d i = j w ij. We then have L (f) = 1 w ij (f i f j ) 2 2 = 1 2 = 1 2 ij ij i w ij f 2 i d i f 2 i ij j w ij f 2 j d j f 2 j = f Df f Wf = f Lf Laplacian matrix: L Normalized Laplacian matrix: ij ij w ij f i f j w ij f i f j L := D 1 2 LD 1 2 = I D 1 2 WD 1 2. Guillaume Obozinski Unsupervised dimensionality reduction 22/30

19 Laplacian embeddings (Belkin and Niyogi, 2001) Principle: Given a weight matrix W find an embedding y i R K for point i that, given scaling and centering constraints on y i, solves We have min Y w ij y i y j 2 2 with Y = [ ] y 1... y n. ij w ij y i y j 2 2 = ( ) 2 w ij Yik Y ij ij ij k=1 K ( ) 2 = w ij Yik Y jk = k=1 ij K K Y k L Y k = tr(y L Y) k=1 Guillaume Obozinski Unsupervised dimensionality reduction 23/30

20 Laplacian embedding formulation min Y tr(y L Y) s.t. Y D Y = I, Y 1 = 0. With the change of variable Ỹ = D 1 2 Y, then Ỹ solves min Ỹ tr(ỹ L Ỹ) s.t. Ỹ Ỹ = I, Ỹ D = 0. But LD = 0, so the columns of Ỹ are the eigenvectors associated with the smallest eigenvalues except for the one D Equivalently, the columns of Y are the solutions of the generalized eigenvalue problem Lu = λdu for the smallest generalized eigenvalues except for the one 1. The rows of the obtained matrix form the embedding. Guillaume Obozinski Unsupervised dimensionality reduction 24/30

21 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 26/30

22 Locally Linear Embedding (Roweis and Saul, 2000) Let x 1,..., x p be a collection of vectors in R p. 1 Construct a k-nn graph 2 Approximate x i by a linear combination of its neighbors, by finding the vector of weights solving the constrained linear regressions: xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) 3 Set w ij = 0 for j / N (i). 4 Find a centered set of points y i in R d with white covariance, which minimizes 2 y i w ij y j j=1 2 Guillaume Obozinski Unsupervised dimensionality reduction 27/30

23 LLE step 2: constrained regressions xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) But if j N (i) w ij = 1, then x i 2 w ij x j = Need to solve j N (i) 2 j N (i) = j N (i) w ij (x i x j ) min u 1 2 u Ku s.t. u 1 = w ij w ik (x i x j ) (x i x k ) }{{} K (i) jk L(u, λ) = 1 2 u Ku λ(u 1 1) and u L = 0 Ku = λ1 Solved for u = K K 1 1. Guillaume Obozinski Unsupervised dimensionality reduction 28/30

24 LLE step 4: final optimization problem min y 1,...,y n s.t. y i y i = 1, 2 w ij y j j=1 1 n 2 y i y i = I p. Equivalently, denoting Y = [ y 1... y n ], we have to solve Or, min Y min Y Y WY 2 F s.t. 1 Y = 0, Y (I n W) (I n W) Y s.t. Y1 = 0, 1 n Y Y = I p. 1 n Y Y = I p. So the columns of 1 n Y are the k eigenvectors associated with the smallest k non-zero eigenvalues. Guillaume Obozinski Unsupervised dimensionality reduction 29/30

25 References I Belkin, M. and Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500): Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5): Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500): Guillaume Obozinski Unsupervised dimensionality reduction 30/30

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

More information

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane, A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL I. Daoudi,, K. Idrissi, S. Ouatik 3 Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR505, F-696, France Faculté Des Sciences, UFR IT, Université

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

More information

The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,delallea,lerouxni}@iro.umontreal.ca

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Lecture 7 Spectral methods

Lecture 7 Spectral methods CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

More information

Theoretical analysis of LLE based on its weighting step

Theoretical analysis of LLE based on its weighting step Theoretical analysis of LLE based on its weighting step Yair Goldberg and Ya acov Ritov Department of Statistics and The Center for the Study of Rationality The Hebrew University March 29, 2011 Abstract

More information

Regression on Manifolds Using Kernel Dimension Reduction

Regression on Manifolds Using Kernel Dimension Reduction Jens Nilsson JENSN@MATHS.LTH.SE Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden Fei Sha FEISHA@CS.BERKELEY.EDU Computer Science Division, University of California, Berkeley,

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Locally Linear Embedded Eigenspace Analysis

Locally Linear Embedded Eigenspace Analysis Locally Linear Embedded Eigenspace Analysis IFP.TR-LEA.YunFu-Jan.1,2005 Yun Fu and Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign 405 North

More information

Spectral Clustering. by HU Pili. June 16, 2013

Spectral Clustering. by HU Pili. June 16, 2013 Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework

More information

Part I Generalized Principal Component Analysis

Part I Generalized Principal Component Analysis Part I Generalized Principal Component Analysis René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Principal Component Analysis (PCA) Given a set of points

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

SVM and Dimensionality Reduction in Cognitive Radio with Experimental Validation

SVM and Dimensionality Reduction in Cognitive Radio with Experimental Validation 1 and ality Reduction in Cognitive Radio with Experimental Validation Shujie Hou, Robert C Qiu*, Senior Member, IEEE, Zhe Chen, Zhen Hu, Student Member, IEEE arxiv:11062325v1 [csni] 12 Jun 2011 Abstract

More information

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 829: Introduction to Data Mining and Analysis Clustering II his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Unsupervised Kernel Dimension Reduction Supplemental Material

Unsupervised Kernel Dimension Reduction Supplemental Material Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Two-Manifold Problems with Applications to Nonlinear System Identification

Two-Manifold Problems with Applications to Nonlinear System Identification Two-Manifold Problems with Applications to Nonlinear System Identification Byron Boots Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 1213 beb@cs.cmu.edu ggordon@cs.cmu.edu

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Signal processing methods have significantly changed. Diffusion Maps. Signal Processing

Signal processing methods have significantly changed. Diffusion Maps. Signal Processing [ Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R. Coifman ] Diffusion Maps for Signal Processing Advances in Kernel-based Learning for Signal Processing istockphoto.com/ aleksandar velasevic [

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

Jordan normal form notes (version date: 11/21/07)

Jordan normal form notes (version date: 11/21/07) Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let

More information

Approximating a Gram Matrix for Improved Kernel-Based Learning

Approximating a Gram Matrix for Improved Kernel-Based Learning Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting Journal of Machine Learning Research 11 (2010) 411-450 Submitted 11/07; Revised 11/09; Published 1/10 Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting Philippos

More information

Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology

Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology Shoshana Ginsburg 1, Sahirzeeshan Ali 1, George Lee 2, Ajay Basavanhally 2, and Anant Madabhushi 1, 1 Department

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Riemannian Manifold Learning for Nonlinear Dimensionality Reduction

Riemannian Manifold Learning for Nonlinear Dimensionality Reduction Riemannian Manifold Learning for Nonlinear Dimensionality Reduction Tony Lin 1,, Hongbin Zha 1, and Sang Uk Lee 2 1 National Laboratory on Machine Perception, Peking University, Beijing 100871, China {lintong,

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

How to learn from very few examples?

How to learn from very few examples? How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A

More information

arxiv: v1 [cs.lg] 29 Dec 2011

arxiv: v1 [cs.lg] 29 Dec 2011 Two-Manifold Problems arxiv:1112.6399v1 [cs.lg] 29 Dec 211 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Recently, there has been much

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Distance Preservation - Part 2

Distance Preservation - Part 2 Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

More information

The Elastic Embedding Algorithm for Dimensionality Reduction

The Elastic Embedding Algorithm for Dimensionality Reduction Miguel Á. Carreira-Perpiñán mcarreira-perpinan@ucmerced.edu Electrical Engineering and Computer Science, School of Engineering, University of California, Merced Abstract We propose a new dimensionality

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Chapter 1. GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour. Introduction. Christopher J.C. Burges

Chapter 1. GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour. Introduction. Christopher J.C. Burges Chapter 1 GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour Christopher J.C. Burges Microsoft Research Abstract Keywords: We give a tutorial overview of several geometric

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }

More information

Distance Preservation - Part I

Distance Preservation - Part I October 2, 2007 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski

More information

Linear algebra and applications to graphs Part 1

Linear algebra and applications to graphs Part 1 Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces

More information

Denoising and Dimension Reduction in Feature Space

Denoising and Dimension Reduction in Feature Space Denoising and Dimension Reduction in Feature Space Mikio L. Braun Fraunhofer Institute FIRST.IDA Kekuléstr. 7, 2489 Berlin mikio@first.fhg.de Joachim Buhmann Inst. of Computational Science ETH Zurich CH-8092

More information

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2

More information

Semi-Supervised Learning on Riemannian Manifolds

Semi-Supervised Learning on Riemannian Manifolds Machine Learning, 56, 209 239, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Semi-Supervised Learning on Riemannian Manifolds MIKHAIL BELKIN misha@math.uchicago.edu PARTHA NIYOGI

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /SSCI.2015.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /SSCI.2015. Maronidis, A, Tefas, A, & Pitas, I 206 Graph Embedding Exploiting Subclasses In 205 IEEE Symposium Series on Computational Intelligence SSCI 205: Proceedings of a meeting held 7-0 December 205, Cape Town,

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Spectral Feature Selection for Supervised and Unsupervised Learning

Spectral Feature Selection for Supervised and Unsupervised Learning Spectral Feature Selection for Supervised and Unsupervised Learning Zheng Zhao Huan Liu Department of Computer Science and Engineering, Arizona State University zhaozheng@asu.edu huan.liu@asu.edu Abstract

More information

Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS) Manifold Approximation by Moving Least-Squares Projection (MMLS) Barak Sober David Levin School of Applied Mathematics, Tel Aviv University, Israel 2018-01-17 arxiv:1606.07104v4 [cs.gr] 16 Jan 2018 Abstract

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Nonlinear Component Analysis Based on Correntropy

Nonlinear Component Analysis Based on Correntropy onlinear Component Analysis Based on Correntropy Jian-Wu Xu, Puskal P. Pokharel, António R. C. Paiva and José C. Príncipe Abstract In this paper, we propose a new nonlinear principal component analysis

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Performance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database

Performance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database Performance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database CMPE 544 Pattern Recognition - Term Project Report Bogazici University Hasan Faik ALAN January 5, 2013 Abstract

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information