Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Similar documents
Diffusion Wavelets and Applications

Diffusion Geometries, Global and Multiscale

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Diffusion Wavelets for multiscale analysis on manifolds and graphs: constructions and applications

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Data-dependent representations: Laplacian Eigenmaps

Multiscale analysis on graphs

Justin Solomon MIT, Spring 2017

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets

Diffusion Wavelets on Graphs and Manifolds

Representation Policy Iteration

Clustering in kernel embedding spaces and organization of documents

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes

Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Heat Kernel Signature: A Concise Signature Based on Heat Diffusion. Leo Guibas, Jian Sun, Maks Ovsjanikov

Data dependent operators for the spatial-spectral fusion problem

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Graph Metrics and Dimension Reduction

Graphs, Geometry and Semi-supervised Learning

Towards Multiscale Harmonic Analysis of Graphs and Document Corpora

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

The crucial role of statistics in manifold learning

Spectral Processing. Misha Kazhdan

Non-linear Dimensionality Reduction

Laplacian Agent Learning: Representation Policy Iteration

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Spectral Algorithms II

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

6/9/2010. Feature-based methods and shape retrieval problems. Structure. Combining local and global structures. Photometric stress

Diffusion Wavelets. Ronald R. Coifman, Mauro Maggioni

From graph to manifold Laplacian: The convergence rate

EE Technion, Spring then. can be isometrically embedded into can be realized as a Gram matrix of rank, Properties:

Multiscale Analysis and Diffusion Semigroups With Applications

Introduction to Spectral Geometry

Manifold Learning and it s application

Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation

Doubling metric spaces and embeddings. Assaf Naor

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Introduction to Spectral Theory

Is Manifold Learning for Toy Data only?

EECS 275 Matrix Computation

Regression on Manifolds Using Kernel Dimension Reduction

Laplacian Mesh Processing

Geometric Constraints II

Global vs. Multiscale Approaches

Lecture: Some Practical Considerations (3 of 4)

Multiscale Manifold Learning

Nonlinear Dimensionality Reduction. Jose A. Costa

Laplace Operator and Heat Kernel for Shape Analysis

Nonlinear Dimensionality Reduction

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

Locality Preserving Projections

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

Conference in Honor of Aline Bonami Orleans, June 2014

High-Dimensional Pattern Recognition using Low-Dimensional Embedding and Earth Mover s Distance

Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

Exploiting Sparse Non-Linear Structure in Astronomical Data

Metric Learning on Manifolds

Learning Representation & Behavior:

Embeddings of finite metric spaces in Euclidean space: a probabilistic view

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Dimensionality Reduction:

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.

Learning gradients: prescriptive models

DIMENSION REDUCTION. min. j=1

Algorithm S1. Nonlinear Laplacian spectrum analysis (NLSA)

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms

Semi-Supervised Learning on Riemannian Manifolds

IFT CONTINUOUS LAPLACIAN Mikhail Bessmeltsev

CS 468, Lecture 11: Covariant Differentiation

A NEW BASIS SELECTION PARADIGM FOR WAVELET PACKET IMAGE CODING

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

KERNEL DENSITY ESTIMATION ON EMBEDDED MANIFOLDS WITH BOUNDARY

Contents. Acknowledgments

Perturbation of the Eigenvectors of the Graph Laplacian: Application to Image Denoising

Manifold Regularization

Math 307 Learning Goals

Stable Spectral Mesh Filtering

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Harnack inequalities and Gaussian estimates for random walks on metric measure spaces. Mathav Murugan Laurent Saloff-Coste

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Robust Laplacian Eigenmaps Using Global Information

Harmonic Analysis and Geometries of Digital Data Bases

Recall that any inner product space V has an associated norm defined by

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

1 First and second variational formulas for area

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Geometry on Probability Spaces

arxiv: v1 [math.na] 9 Sep 2014

Dimension Reduction and Low-dimensional Embedding

Stable MMPI-2 Scoring: Introduction to Kernel. Extension Techniques

Differential Geometry for Image Processing

RIGIDITY OF MINIMAL ISOMETRIC IMMERSIONS OF SPHERES INTO SPHERES. Christine M. Escher Oregon State University. September 10, 1997

Transcription:

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. R.R. Coifman, S. Lafon, MM Mathematics Department Program of Applied Mathematics. Yale University

Motivations The main problem is to analyse lots of data in high dimensions. Paradigm: we have a large number of documents (e.g.: web pages, gene array data, (hyper)spectral data, molecular dynamics data etc...) and a way of measuring similarity between pairs. Model: a graph (G,E,W) In some cases: vertices are points in high-dimensional Euclidean space, weights are a function of Euclidean distance. Problems Understand data sets in high-dimensions, and classes of functions on them Approximation and learning of such functions Parametrize low dimensional data sets embedded in high-dimension Fast algorithms

Biotech data (Gene arrays, proteomic data) Customer databases: companies collect and process information on (potential) customers Financial data High dimensional data: examples Web searching Satellite imagery however... In many situations constraints force the data to lie on sets which a very small intrinsic dimensionality compared to that of the ambient space. In the case of graphs, or arbitrary metric spaces, there are notions of intrinsic complexity, or of embeddability in low dimensional Hilbert spaces.

Curse of dimensionality The high dimension is an obstacle to the processing of the data: Approximation of functions: to represent C 1 functions on a grid with accuracy, one needs -n grid points Density estimation difficult: one needs a lot of data points, otherwise most bins are empty Computational cost of many algorithms grows exponentially with the dimension (e.g. Nearest neighbor search, Fast Multipole Method)

Diffusion Geometries RR Coifman & S. Lafon Geodesic distance ---> Diffusion distance Diffusion distance is more stable, uses a preponderance of evidence

On the graph of documents with similarities there is a natural random walk: we get a Markov chain represented by a matrix P(x,y). If P is symmetric and positive semidefinite, we can define the diffusion distance by D 2 m ( x, y) p m ( x, x) p m ( y, y) 2 p m ( x, y) m m p ( x,.) p ( y,.) 2 m j ( ( x) ( y)) j Geometric Diffusion map j j 2 x X(x) { i i ( x)} l 2. Embeds the graph in Euclidean space, up to precision, via the eigenfunctions, mapping diffusion distance into Euclidean distance. For a set of points in Euclidean space, sampled from a Riemannian manifold, one can build a discretized Laplace-Beltrami operator (associated to the canonical Brownian motion constrained on the manifold) and map the manifold with diffusion distance isometrically in Euclidean space.

Original points Embeddings

Phi1 Phi2 Phi3

Diffusion Wavelets RR Coifman & MM Eigenfunctions are like global Fourier Analysis on the data set, they live in different frequency bands but are not localized. We would like to have elements localized both in frequency and space (compatibly with Heisenberg principles), and critically sampled at the rate corresponding to the frequency band. Where are the frequencies? 1.9.8.7.6.5 (T2 ).4 (T4 ).3 (T8 ).2 (T16 ).1 5 1 15 2 25 3... V V V V 3 2 1

Multiresolution diffusion wavelet construction of orthonormal diffusion scaling functions.

All this can be done in n log(n), n cardinality of the space!

Fast multipole method for generalized potentials

8 12 1.2.5 1.4.3.8.2.6.1.4.2 -.1 -.2 -.2 5 1 15 12 2 25 3 -.3.2.3.15.2.1.1.5 -.1 -.5 -.2 -.1 -.3 -.15 5 1 15 1 15 2 25 3 15.4 -.4 5 2 25 3 -.2 5 1 15 2 25 3

( 16 (x), 16,2 16,3 (x)).15.25.2.1.15.5.1.5 -.5 -.5 -.1 -.1 -.15 -.15 -.2 5 1 15 2 25 3 -.2 -.5.5.1.15.2.25.3

Comments, Applications, etc... This is wavelet analysis on manifolds (and more, e.g. fractals), graphs, markov chains, while Laplacian eigenfunctions do Fourier Analysis on manifolds (and fractals, etc...). We are compressing powers of the operator, functions of the operators, subspaces of the function subspaces on which its powers act (Heisenberg principle...), and the space itself (sampling theorems, quadrature formulas...) We are constructing a biorthogonal version of the transform (better adapted to studying Markov chains) and wavelet packets: this will allow efficient denoising, compression, discrimination on all the spaces mentioned above. The multiscale spaces are a natural scale of complexity spaces for learning empirical functions on the data set. Diffusion wavelets extend outside the set, in a natural multiscale fashion. To be tied with measure-geometric considerations used to embed metric spaces in Euclidean spaces with small distortion. Study and compression of dynamical systems.