March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Similar documents
Diffusion Wavelets and Applications

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Graphs, Geometry and Semi-supervised Learning

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data-dependent representations: Laplacian Eigenmaps

From graph to manifold Laplacian: The convergence rate

Learning gradients: prescriptive models

Sturm-Liouville operators have form (given p(x) > 0, q(x)) + q(x), (notation means Lf = (pf ) + qf ) dx

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Stable MMPI-2 Scoring: Introduction to Kernel. Extension Techniques

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

MATH 829: Introduction to Data Mining and Analysis Clustering II

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Geometry on Probability Spaces

Graph Metrics and Dimension Reduction

Diffusion Geometries, Global and Multiscale

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Bi-stochastic kernels via asymmetric affinity functions

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Diffusion Wavelets for multiscale analysis on manifolds and graphs: constructions and applications

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Algorithms II

Clustering in kernel embedding spaces and organization of documents

Definition and basic properties of heat kernels I, An introduction

DIMENSION REDUCTION. min. j=1

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Spectral Clustering. Zitao Liu

Kernels A Machine Learning Overview

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Data dependent operators for the spatial-spectral fusion problem

DIFFUSION MAPS, REDUCTION COORDINATES AND LOW DIMENSIONAL REPRESENTATION OF STOCHASTIC SYSTEMS

Exploiting Sparse Non-Linear Structure in Astronomical Data

Nonlinear Dimensionality Reduction

An Analysis of the Convergence of Graph Laplacians

Conference in Honor of Aline Bonami Orleans, June 2014

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.

An Iterated Graph Laplacian Approach for Ranking on Manifolds

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

General Inner Product and The Fourier Series

Spectral Processing. Misha Kazhdan

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Functional Analysis Review

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

MATH 567: Mathematical Techniques in Data Science Clustering II

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Measure-based diffusion grid construction and high-dimensional data discretization

Global vs. Multiscale Approaches

High-Dimensional Pattern Recognition using Low-Dimensional Embedding and Earth Mover s Distance

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Basic Calculus Review

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes

EECS 275 Matrix Computation

8.1 Concentration inequality for Gaussian random matrix (cont d)

Limits of Spectral Clustering

Unsupervised dimensionality reduction

Kernel methods for comparing distributions, measuring dependence

Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

Manifold Coarse Graining for Online Semi-supervised Learning

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets

Justin Solomon MIT, Spring 2017

Locality Preserving Projections

c 4, < y 2, 1 0, otherwise,

Multiscale Manifold Learning

The crucial role of statistics in manifold learning

Harmonic Analysis and Geometries of Digital Data Bases

Diffusion maps, spectral clustering and reaction coordinates of dynamical systems

ACM/CMS 107 Linear Analysis & Applications Fall 2017 Assignment 2: PDEs and Finite Element Methods Due: 7th November 2017

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Laplacian Agent Learning: Representation Policy Iteration

Solutions of Semilinear Elliptic PDEs on Manifolds

Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Lecture: Some Practical Considerations (3 of 4)

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

Multiscale Analysis and Diffusion Semigroups With Applications

The spectral zeta function

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Waves on 2 and 3 dimensional domains

Metric Learning on Manifolds

Regression on Manifolds Using Kernel Dimension Reduction

Reproducing Kernel Hilbert Spaces

MATH 567: Mathematical Techniques in Data Science Clustering II

Graph Partitioning Using Random Walks

Non-linear Dimensionality Reduction

Analysis Preliminary Exam Workshop: Hilbert Spaces

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

The Laplacian ( ) Matthias Vestner Dr. Emanuele Rodolà Room , Informatik IX

Manifold Regularization

Spectral Techniques for Clustering

Transcription:

Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Kernels Figure: Example Application from [LafonWWW] meaningful geometric descriptions of data sets data parametrization (demo) dimensionality reduction

Kernels 1 Table of Contents 2 Application of to 3 Limit of the Graph Kernels 4 5 and Appendix

Kernels 1 2 Application of to 3 Limit of the Graph Kernels 4 5 and Appendix

Process on a Graph Kernels : Kernel k : X X R on data set X Symmetric: k(x, y) = k(y, x) positivity preserving:k(x, y) 0 represents similarity between points in X defines edge weight matrix W in weighted graph (X, k) Normalized Graph L (rw) = P d(x) = X k(x, y)dµ(y) (degree, discrete: d i = n j=1 w ij) p(x, y) = k(x,y) d(x) ( p(x, y)dµ(y) = 1) X p(x, y) = transition probability in one time step Matrix P defines Markov chain :P t = probability of transition from x to y in t steps

Kernels Process on a Graph : a cluster is a region in which the probability of escaping this region is low. Figure: at time t = 8, right: P 8, left: color from one row of P 8

Kernels Process on a Graph Block structure of P reveals clusters, after 64 steps the closest two clusters have merged Figure: at time t = 64

Process on a Graph All clusters have merged after 1024 time steps Kernels Figure: at time t = 1024

Kernels Distance Goal: relate spectral properties of Markov chain to geometry of the data : distances {D t } t N D t (x, y) 2 p t (x, ) p t (y, ) L 2 (X,dµ/π) D t (x, y) will be small, if there is a large number of short paths between x and y Figure: Example paths for diffusion distance [Maggioni06]

Kernels distances can be computed using eigenvectors ψ l and eigenvalues λ l of P ( ) 1 D t (x, y) = l 1 λ2t l (ψ l (x) ψ l (y)) 2 2 The proof uses the spectral theorem in the Hilbert space (more later) and the fact that the eigenfunctions of P are orthonormal. Using 1 = λ 0 > λ 1 λ 2... the distance may be approximated with the first s eigenvalues.

Kernels : Map Ψ t (x) : X R s Ψ t (x) λ t 1 ψ 1(x) λ t 2 ψ 2(x). λ t sψ s (x) Proposition: The diffusion map Ψ t embeds the data into the Euclidean space R s so that in this space, the Euclidean distance is equal to the diffusion distance (up to relative accuracy), or equivalently Ψ t (x) Ψ t (y) = D t (x, y) Remark: Unlike Eigenmaps, each dimension is weighted by the decreasing eigenvalues.

Example of Eigenfunctions ψ Kernels Figure: First 4 eigenfunctions of a dumb-bell shaped manifold. [Maggioni06]

Kernels is one possible application of diffusion maps. 1 Construct similarity graph 2 Compute normalized 3 Solve generalized eigenvector problem Lu = λdu 4 Define the embedding into k-dimensional Euclidean space via diffusion maps 5 Cluster points y i R k with k-means

Kernels 1 2 Application of to 3 Limit of the Graph Kernels 4 5 and Appendix

New Scenario Kernels Points are sampled from a probability density on a submanifold of R n Sampling often not related to geometry of manifold. Biased data: e.g. more faces from one pose Goal: Recover manifold structure regardless of the distribution of the data Additional concepts needed for this continuous setting Figure: Manifold with density [Learned07]

The Continuous Setting Kernels : Manifold: A space in which every point has a neighborhood which resembles Euclidean space, but in which the global structure may be more complicated, e.g. sphere or 2-d surface : Hilbert Space: An inner product space X on a space S that is complete under the norm f = f, f defined by the inner product,. For example the L 2 norm: f, g = S f (x)g(x)dx Ability to define functions f using a function basis (Φ i ): f = α i Φ i (x) Orthonormal basis similar to vector space: Φ i = 1 i and Φ i, Φ j = 0 i j

The Continuous Setting Kernels : L : X X A function of functions Linear operator: L(λf ) = λlf Eigenfunction of an operator:lf = λf An operator is Hermitian (symmetric) if Lf, g = f, Lg Eigenfunctions of Hermitian operators form an orthonormal basis on their Hilbert space X. Example:Laplace

Kernels : The Laplace Interesting properties: f = n 2 f x 2 i=1 i First eigenfunction is constant ( 2 c x 2 i = 0), Second eigenfunction has to change signs (orthogonal to first) and needs to be only scaled by operator:...

Kernels : The Laplace Interesting properties: f = n 2 f x 2 i=1 i First eigenfunction is constant ( 2 c = 0), xi 2 Second eigenfunction has to change signs (orthogonal to first) and needs to be only scaled by operator:... sine and cosine, since (sin(ωx)) = ω 2 sin(ωx) Hence, the eigenfunctions of form a nice orthonormal basis in X. The is extension of normal to manifolds. Problem: We only have a finite sample from a probability measure p on an m-dimensional submanifold M in R d.

Kernels Limit of the Graph Theorem: Let M be a m-dimensional submanifold in R d, {X i } n i=1 a sample from a probability measure P on M with density p. Then under several conditions on M, p and the kernel k, we have: If neighborhood h 0, number of points in it n and nh m+2 / log n, then the random walk converges to the operator lim n (L(rw) n f )(x) ( s f )(x) Where the weighted operator s = M + s p, f p induces an anisotropic diffusion towards or away from increasing density depending on s.

Kernels Now, we have established ourselves in the awesome world of operators in Hilbert Spaces on submanifolds. Initial motivation was to analyze geometry regardless of sampling distribution. What is the influence of the geometry and the density over the eigenfunctions and the spectrum of the diffusion?

Kernels We introduced a family of weighted operators that allow a scaling of the influence of the density via one parameter s: s = M + s p p, f The smoothness functional induced by s is: S(f ) = f 2 p s dv M and is to be minimized. ( to graph : i,j w ij(fi fj) 2 ) Hence, this functional prefers functions that are smooth in high density regions (see board).

Kernels Construction of the Family of s Now, we have the tools to define a new kernel for the weights of normalized graph s. But what exactly has changed in the construction of diffusion kernels?

Kernels Standard Normalized Graph Fix kernel k(x, y) - - - - d(x) = k(x, y)dµ(y) X p(x, y) = k(x,y) d(x) Kernels Kernel Normalized Graph Fix kernel k(x, y) Renormalize weight into new anisotropic kernel: q(x) = X k(x, y)q(y)dy k (α) = k(x,y) q α (x)q α (y) d (α) (x) = X k(α) (x, y)q(y)dy p (α) = k(α) (x,y) d (α) (x) α = 0: Construct normalized weights for graph α = 1: approximation, the normalization removes the influence of the density and recovers the geometry of the data. s = 2(1 α)

Kernels is one possible application of diffusion maps. 1 Construct similarity graph Apply normalization 2 Compute normalized 3 Solve generalized eigenvector problem Lu = λdu 4 Define the embedding into k-dimensional Euclidean space via diffusion maps 5 Cluster points y i R k with k-means

Kernels Embeddings via Approximation Figure: From left to right: original curves, the densities of points, the embeddings via the graph (α = 0) and the embeddings via the Laplace Beltrami approximation (α = 1). In the latter case, the curve is embedded as a perfect circle and the arclength parametrization is recovered.

Parametrization of Curves Kernels Figure: Live Example from [LafonWWW]

Kernels 1. Map: Ψ t (x) : X R s Ψ t (x) λ t 1 ψ 1(x) λ t 2 ψ 2(x). λ t sψ s (x) data into the Euclidean space so that: Ψ t (x) Ψ t (y) = D t (x, y) 2. Approximation: k (α) = k(x,y) q α (x)q α (y) normalization parameter α steers the influence of the density allows the complete separation of the distribution of the data from the geometry of the underlying manifold

Kernels Coifman06: R.R. Coifman, S. Lafon, maps, Appl. Comput. Harmon. Anal. 21 (1) (2006) 631. LafonWWW: Stephane Lafon s website: http: // www. math. yale. edu/ ~ sl349/ Luxburg07: von Luxburg, U.: A Tutorial on. Statistics and Computing 17(4), 395-416 (12 2007) Hein07: M. Hein, J.-Y. Audibert, U. von Luxburg. Convergence of graph s on random neighborhood graphs, Journal of Machine Learning Research 8, 1325-1370, 2007. Learned07: Manifold Picture from http: // www. cs. umass. edu/ ~ elm/ papers_ by_ research. html Maggioni06: and wavelet bases for value function approximation and their connection to kernel methods, Mauro Maggioni, Yale University, ICML Workshop, June 29th, 2006

Thank you. Kernels