# Unsupervised dimensionality reduction

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30

2 Outline 1 PCA 2 Kernel PCA 3 Multidimensional scaling 4 Laplacian Eigenmaps 5 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 2/30

3 PCA Guillaume Obozinski Unsupervised dimensionality reduction 4/30

4 A direction that maximizes the variance Data are points in R d. Looking for a direction v in R d such that the variance of the signals projected on v is maximized: Var((v x i )...n ) = 1 n = 1 n (v x i ) 2 v x i x i v = v ( 1 n = v Σ v x i x i ) v Need to solve max v 2 =1 v Σ v Solution: eigenvector associated to the largest eigenvalue of Σ Guillaume Obozinski Unsupervised dimensionality reduction 5/30

5 Principal directions Assume the design matrix is centered, i.e. X 1 = 0. Principal directions as eigenvectors of the covariance Consider the eigenvalue decomposition of Σ = X X: Σ = VS 2 V with S = Diag(s 1,..., s n ) and s 1... s n. The principal directions are the columns of V Principal directions as singular vectors of the design matrix Consider the singular value decomposition of X X = USV The principal directions are the right singular vectors. Guillaume Obozinski Unsupervised dimensionality reduction 6/30

6 Principal components Obtained by projection of the rows of X on V. But XV = USV V = US. So the principal components are obtained with the left singular vectors and the singular values. Guillaume Obozinski Unsupervised dimensionality reduction 7/30

7 Kernel PCA Guillaume Obozinski Unsupervised dimensionality reduction 9/30

8 Centering implicitly in feature space Assume that we use a mapping φ so that the representation of the data is the design matrix Then φ(x 1 ) Φ =.. φ(x n ) φ = 1 n φ(x i ) = Φ 1. So that if φ(x i ) = φ(x i ) φ then Φ = Φ 11 Φ = ( I n 1 n 11 ) Φ Finally, the center kernel matrix is computed as K = Φ Φ = HKH with H = I n 1 n 11. Guillaume Obozinski Unsupervised dimensionality reduction 10/30

9 Principal function in a RKHS (Schölkopf et al., 1998) Find a function f with f H = 1 that maximizes Equivalently, max f f, h xi 2 H. f (x i ) 2 s.t. f 2 H 1. By the representer theorem, f (x) = n j=1 α jk(x, x j ). So, f (x i ) 2 = = ( ) 2 α j K(x i, x j ) j=1 j,j =1 = α K K α α j α j K(x i, x j )K(x i, x j ) So the problem can be written as max α α K K α s.t. α K α. Guillaume Obozinski Unsupervised dimensionality reduction 11/30

10 Solution of kernel PCA Write K = US 2 U. If β = U α, then the problem is formulated as max β β2 i s 4 i s.t. β 2 i s 2 i 1 This is attained for β = ( 1 s 1, 0,..., 0) and thus α = 1 s 1 u 1. So the first principal function is f (x) = 1 s 1 U i1 K(x i, x) And the kth principal function is f (x) = 1 s k U ik K(x i, x) Guillaume Obozinski Unsupervised dimensionality reduction 12/30

11 Multidimensional scaling Guillaume Obozinski Unsupervised dimensionality reduction 14/30

12 Multidimensional scaling Goal: Given a collection of not necessarily Euclidean distances δ ij between pairs of points indexed by {1,..., n}. Construct a collection of points y i in a Euclidean space such that y i y j 2 δ ij Original formulation: Minimize a function called stress function ( ) 2 min yi y j 2 δ ij Y ij Classical formulation: min Y ( yi y j 2 2 δij 2 ij ) 2 Guillaume Obozinski Unsupervised dimensionality reduction 15/30

13 Centered kernel matrix from a Euclidean distance matrix Lemma If D 2 = ( dij 2 ) 1 i,j n is a matrix of squared Euclidean distances, then K = 1 2 HD 2H with H = I n 1 n 11, is the corresponding centered kernel matrix. Proof: d 2 ij = φ(x i ) φ(x j ) 2 2 = K ii + K jj 2K jj With κ = (K 11,..., K nn ), we have 2K = κ1 + 1κ D 2 K = HKH = 1 2 H ( κ1 + 1κ D 2 ) H. Guillaume Obozinski Unsupervised dimensionality reduction 16/30

14 Classical MDS algorithm Algorithm: 1 Compute K = 1 2 HD 2H. 2 Remove negative eigenvalues from K. 3 Solve kernel PCA on K If D 2 are Euclidean distances, step 2 is unnecessary and it can be shown that this solves the classical MDS problem. Guillaume Obozinski Unsupervised dimensionality reduction 17/30

15 Isomap (Tenenbaum et al., 2000) Algorithm: 1 Compute a k-nn graph on the data 2 Compute geodesic distances on the k-nn graph using the l 2 distance on each edge. 3 Apply classical MDS to the obtained distances Remarks: Isomap assumes that we can rely on the l 2 distance locally will fail if there are too many noise dimensions. geodesic distances can be computed with e.g. the Floyd Warshall algorithm Guillaume Obozinski Unsupervised dimensionality reduction 18/30

16 Laplacian Eigenmaps Guillaume Obozinski Unsupervised dimensionality reduction 20/30

17 Graph Laplacians Assume a similarity matrix W is available on the data, e.g. a kernel matrix such as W = (w ij ) 1 i,j n with ( w ij = exp 1 ) h x i x j 2 2 We can think of W as defining a weighted graph on the data. We say that a function is smooth on the weighted graph if its Laplacian L (f ) := 1 w ij (f (x i ) f (x j )) 2 2 ij is small. We say that a vector f is smooth on the weighted graph if its Laplacian L (f) := 1 w ij (f i f j ) 2 2 is small. ij Guillaume Obozinski Unsupervised dimensionality reduction 21/30

18 Laplacian and normalized Laplacian matrices Define D = Diag(d) with d i = j w ij. We then have L (f) = 1 w ij (f i f j ) 2 2 = 1 2 = 1 2 ij ij i w ij f 2 i d i f 2 i ij j w ij f 2 j d j f 2 j = f Df f Wf = f Lf Laplacian matrix: L Normalized Laplacian matrix: ij ij w ij f i f j w ij f i f j L := D 1 2 LD 1 2 = I D 1 2 WD 1 2. Guillaume Obozinski Unsupervised dimensionality reduction 22/30

19 Laplacian embeddings (Belkin and Niyogi, 2001) Principle: Given a weight matrix W find an embedding y i R K for point i that, given scaling and centering constraints on y i, solves We have min Y w ij y i y j 2 2 with Y = [ ] y 1... y n. ij w ij y i y j 2 2 = ( ) 2 w ij Yik Y ij ij ij k=1 K ( ) 2 = w ij Yik Y jk = k=1 ij K K Y k L Y k = tr(y L Y) k=1 Guillaume Obozinski Unsupervised dimensionality reduction 23/30

20 Laplacian embedding formulation min Y tr(y L Y) s.t. Y D Y = I, Y 1 = 0. With the change of variable Ỹ = D 1 2 Y, then Ỹ solves min Ỹ tr(ỹ L Ỹ) s.t. Ỹ Ỹ = I, Ỹ D = 0. But LD = 0, so the columns of Ỹ are the eigenvectors associated with the smallest eigenvalues except for the one D Equivalently, the columns of Y are the solutions of the generalized eigenvalue problem Lu = λdu for the smallest generalized eigenvalues except for the one 1. The rows of the obtained matrix form the embedding. Guillaume Obozinski Unsupervised dimensionality reduction 24/30

21 Locally Linear Embedding Guillaume Obozinski Unsupervised dimensionality reduction 26/30

22 Locally Linear Embedding (Roweis and Saul, 2000) Let x 1,..., x p be a collection of vectors in R p. 1 Construct a k-nn graph 2 Approximate x i by a linear combination of its neighbors, by finding the vector of weights solving the constrained linear regressions: xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) 3 Set w ij = 0 for j / N (i). 4 Find a centered set of points y i in R d with white covariance, which minimizes 2 y i w ij y j j=1 2 Guillaume Obozinski Unsupervised dimensionality reduction 27/30

23 LLE step 2: constrained regressions xi min 2 w ij x j, with w ij = 1. w i 2 j N (i) j N (i) But if j N (i) w ij = 1, then x i 2 w ij x j = Need to solve j N (i) 2 j N (i) = j N (i) w ij (x i x j ) min u 1 2 u Ku s.t. u 1 = w ij w ik (x i x j ) (x i x k ) }{{} K (i) jk L(u, λ) = 1 2 u Ku λ(u 1 1) and u L = 0 Ku = λ1 Solved for u = K K 1 1. Guillaume Obozinski Unsupervised dimensionality reduction 28/30

24 LLE step 4: final optimization problem min y 1,...,y n s.t. y i y i = 1, 2 w ij y j j=1 1 n 2 y i y i = I p. Equivalently, denoting Y = [ y 1... y n ], we have to solve Or, min Y min Y Y WY 2 F s.t. 1 Y = 0, Y (I n W) (I n W) Y s.t. Y1 = 0, 1 n Y Y = I p. 1 n Y Y = I p. So the columns of 1 n Y are the k eigenvectors associated with the smallest k non-zero eigenvalues. Guillaume Obozinski Unsupervised dimensionality reduction 29/30

25 References I Belkin, M. and Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500): Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5): Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500): Guillaume Obozinski Unsupervised dimensionality reduction 30/30

### Nonlinear Dimensionality Reduction

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

### Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

### Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

### Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

### Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

### Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

### Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

### Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

### A Duality View of Spectral Methods for Dimensionality Reduction

Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of

### Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

### Dimensionality Reduc1on

Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

### Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

### Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

### A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL I. Daoudi,, K. Idrissi, S. Ouatik 3 Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR505, F-696, France Faculté Des Sciences, UFR IT, Université

### SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

### Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

### Locality Preserving Projections

Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

### LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

### Discriminant Uncorrelated Neighborhood Preserving Projections

Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

### Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

### The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,delallea,lerouxni}@iro.umontreal.ca

### MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

### Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

### Dimensionality Reduction: A Comparative Review

Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap

### Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

### Local Learning Projections

Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

### Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

### Lecture 7 Spectral methods

CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

### Theoretical analysis of LLE based on its weighting step

Theoretical analysis of LLE based on its weighting step Yair Goldberg and Ya acov Ritov Department of Statistics and The Center for the Study of Rationality The Hebrew University March 29, 2011 Abstract

### Regression on Manifolds Using Kernel Dimension Reduction

Jens Nilsson JENSN@MATHS.LTH.SE Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden Fei Sha FEISHA@CS.BERKELEY.EDU Computer Science Division, University of California, Berkeley,

### Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

### Locally Linear Embedded Eigenspace Analysis

Locally Linear Embedded Eigenspace Analysis IFP.TR-LEA.YunFu-Jan.1,2005 Yun Fu and Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign 405 North

### Spectral Clustering. by HU Pili. June 16, 2013

Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework

### Part I Generalized Principal Component Analysis

Part I Generalized Principal Component Analysis René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Principal Component Analysis (PCA) Given a set of points

### Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

### PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

### PRINCIPAL COMPONENTS ANALYSIS

121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

### SVM and Dimensionality Reduction in Cognitive Radio with Experimental Validation

1 and ality Reduction in Cognitive Radio with Experimental Validation Shujie Hou, Robert C Qiu*, Senior Member, IEEE, Zhe Chen, Zhen Hu, Student Member, IEEE arxiv:11062325v1 [csni] 12 Jun 2011 Abstract

### MATH 829: Introduction to Data Mining and Analysis Clustering II

his lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 829: Introduction to Data Mining and Analysis Clustering II Dominique Guillot Departments

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

### Computation. For QDA we need to calculate: Lets first consider the case that

Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

### What is Principal Component Analysis?

What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

### Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

### Unsupervised Kernel Dimension Reduction Supplemental Material

Unsupervised Kernel Dimension Reduction Supplemental Material Meihong Wang Dept. of Computer Science U. of Southern California Los Angeles, CA meihongw@usc.edu Fei Sha Dept. of Computer Science U. of Southern

### Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

### Online Kernel PCA with Entropic Matrix Updates

Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

### Two-Manifold Problems with Applications to Nonlinear System Identification

Two-Manifold Problems with Applications to Nonlinear System Identification Byron Boots Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 1213 beb@cs.cmu.edu ggordon@cs.cmu.edu

### Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

### Signal processing methods have significantly changed. Diffusion Maps. Signal Processing

[ Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R. Coifman ] Diffusion Maps for Signal Processing Advances in Kernel-based Learning for Signal Processing istockphoto.com/ aleksandar velasevic [

### Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

### Jordan normal form notes (version date: 11/21/07)

Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let

### Approximating a Gram Matrix for Improved Kernel-Based Learning

Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New

### Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

### Immediate Reward Reinforcement Learning for Projective Kernel Methods

ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

### Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

Journal of Machine Learning Research 11 (2010) 411-450 Submitted 11/07; Revised 11/09; Published 1/10 Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting Philippos

### Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology

Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology Shoshana Ginsburg 1, Sahirzeeshan Ali 1, George Lee 2, Ajay Basavanhally 2, and Anant Madabhushi 1, 1 Department

### December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

### Riemannian Manifold Learning for Nonlinear Dimensionality Reduction

Riemannian Manifold Learning for Nonlinear Dimensionality Reduction Tony Lin 1,, Hongbin Zha 1, and Sang Uk Lee 2 1 National Laboratory on Machine Perception, Peking University, Beijing 100871, China {lintong,

### Least Squares Optimization

Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

### Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

### Kernel Methods. Machine Learning A W VO

Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

### Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

### Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

### How to learn from very few examples?

How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A

### arxiv: v1 [cs.lg] 29 Dec 2011

Two-Manifold Problems arxiv:1112.6399v1 [cs.lg] 29 Dec 211 Byron Boots Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 beb@cs.cmu.edu Abstract Recently, there has been much

### CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

### Distance Preservation - Part 2

Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

### The Elastic Embedding Algorithm for Dimensionality Reduction

Miguel Á. Carreira-Perpiñán mcarreira-perpinan@ucmerced.edu Electrical Engineering and Computer Science, School of Engineering, University of California, Merced Abstract We propose a new dimensionality

### L11: Pattern recognition principles

L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

### Principal Component Analysis

Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

### Chapter 1. GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour. Introduction. Christopher J.C. Burges

Chapter 1 GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour Christopher J.C. Burges Microsoft Research Abstract Keywords: We give a tutorial overview of several geometric

### DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

### Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial

### Learning with kernels and SVM

Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

### Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

### Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }

### Distance Preservation - Part I

October 2, 2007 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski

### Linear algebra and applications to graphs Part 1

Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces

### Denoising and Dimension Reduction in Feature Space

Denoising and Dimension Reduction in Feature Space Mikio L. Braun Fraunhofer Institute FIRST.IDA Kekuléstr. 7, 2489 Berlin mikio@first.fhg.de Joachim Buhmann Inst. of Computational Science ETH Zurich CH-8092

### Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2

### Semi-Supervised Learning on Riemannian Manifolds

Machine Learning, 56, 209 239, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Semi-Supervised Learning on Riemannian Manifolds MIKHAIL BELKIN misha@math.uchicago.edu PARTHA NIYOGI

### University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /SSCI.2015.

Maronidis, A, Tefas, A, & Pitas, I 206 Graph Embedding Exploiting Subclasses In 205 IEEE Symposium Series on Computational Intelligence SSCI 205: Proceedings of a meeting held 7-0 December 205, Cape Town,

### Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

### Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

### Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

### FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

### below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

### Spectral Feature Selection for Supervised and Unsupervised Learning

Spectral Feature Selection for Supervised and Unsupervised Learning Zheng Zhao Huan Liu Department of Computer Science and Engineering, Arizona State University zhaozheng@asu.edu huan.liu@asu.edu Abstract

### Manifold Approximation by Moving Least-Squares Projection (MMLS)

Manifold Approximation by Moving Least-Squares Projection (MMLS) Barak Sober David Levin School of Applied Mathematics, Tel Aviv University, Israel 2018-01-17 arxiv:1606.07104v4 [cs.gr] 16 Jan 2018 Abstract

### Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

### Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

### Nonlinear Component Analysis Based on Correntropy

onlinear Component Analysis Based on Correntropy Jian-Wu Xu, Puskal P. Pokharel, António R. C. Paiva and José C. Príncipe Abstract In this paper, we propose a new nonlinear principal component analysis

### Principal Component Analysis

B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

### Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with

### Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

### Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

### Kernel Methods. Barnabás Póczos

Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

### Performance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database

Performance Evaluation of Nonlinear Dimensionality Reduction Methods on the BANCA Database CMPE 544 Pattern Recognition - Term Project Report Bogazici University Hasan Faik ALAN January 5, 2013 Abstract