Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Similar documents
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Spectral Processing. Misha Kazhdan

LECTURE NOTE #11 PROF. ALAN YUILLE

Justin Solomon MIT, Spring 2017

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Heat Kernel Signature: A Concise Signature Based on Heat Diffusion. Leo Guibas, Jian Sun, Maks Ovsjanikov

An Introduction to Spectral Graph Theory

Linear algebra and applications to graphs Part 1

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

Lecture 10: Dimension Reduction Techniques

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

8.1 Concentration inequality for Gaussian random matrix (cont d)

Spectral Algorithms II

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Unsupervised dimensionality reduction

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Data dependent operators for the spatial-spectral fusion problem

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Functional Maps ( ) Dr. Emanuele Rodolà Room , Informatik IX

Fundamentals of Matrices

1 Matrix notation and preliminaries from spectral graph theory

Non-linear Dimensionality Reduction

Data-dependent representations: Laplacian Eigenmaps

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Networks and Their Spectra

Nonlinear Dimensionality Reduction

Markov Chains, Random Walks on Graphs, and the Laplacian

The Nyström Extension and Spectral Methods in Learning

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Spectral Generative Models for Graphs

Principal Component Analysis

Notes on Linear Algebra and Matrix Theory

A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching 1

Convex Optimization of Graph Laplacian Eigenvalues

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Average mixing of continuous quantum walks

Robust Laplacian Eigenmaps Using Global Information

Linear Algebra in Actuarial Science: Slides to the lecture

Spectral Graph Theory

Properties of Matrices and Operations on Matrices

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Spectral Clustering. Zitao Liu

CS 468, Spring 2013 Differential Geometry for Computer Science Justin Solomon and Adrian Butscher

Statistical Pattern Recognition

Lecture 2: Linear Algebra Review

Markov Chains and Spectral Clustering

Diffusion and random walks on graphs

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Eigenvalue comparisons in graph theory

Graph Isomorphisms and Automorphisms via Spectral Signatures

Lecture 02 Linear Algebra Basics

Analysis of Spectral Kernel Design based Semi-supervised Learning

1 Matrix notation and preliminaries from spectral graph theory

Multivariate Statistical Analysis

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Matrix Mathematics. Theory, Facts, and Formulas with Application to Linear Systems Theory. Dennis S. Bernstein

Critical Groups for Cayley Graphs of Bent Functions

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Lecture 13: Spectral Graph Theory

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

An Algorithmist s Toolkit September 10, Lecture 1

Numerical Methods I Singular Value Decomposition

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

AN ASYMPTOTIC BEHAVIOR OF QR DECOMPOSITION

Quantum Computing Lecture 2. Review of Linear Algebra

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Machine Learning for Data Science (CS4786) Lecture 11

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Unsupervised Learning: Dimensionality Reduction

Graph Partitioning Using Random Walks

CSE 554 Lecture 7: Alignment

2. Matrix Algebra and Random Vectors

Maximizing the numerical radii of matrices by permuting their entries

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Graph fundamentals. Matrices associated with a graph

Introduction to Machine Learning

1 Last time: least-squares problems

Eigenvalues and Eigenvectors

Kernel methods for comparing distributions, measuring dependence

Machine Learning - MT Clustering

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

On the inverse matrix of the Laplacian and all ones matrix

EECS 275 Matrix Computation

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Lecture: Face Recognition and Feature Reduction

Handout to Wu Characteristic Harvard math table talk Oliver Knill, 3/8/2016

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Transcription:

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/

Outline of Lecture 9 Kernels on graphs The exponential diffusion kernel The heat kernel of discrete manifolds Properties of the heat kernel The auto-diffusion function Shape matching

Material for this lecture P. Bérard, G. Besson, & G. Gallot. Embedding Riemannian manifolds by their heat kernel. Geometric and Functional Analysis (1994): A rather theoretical paper. J. Shawe-Taylor & N. Cristianini. Kernel Methods in Pattern Analysis (chapter 10): A mild introduction to graph kernels. R. Kondor and J.-P. Vert: Diffusion Kernels in Kernel Methods in Computational Biology ed. B. Scholkopf, K. Tsuda and J.-P. Vert, (The MIT Press, 2004): An interesting paper to read. J. Sun, M. Ovsjanikov, & L. Guibas. A concise and provably informative multi-scale signature based on heat diffusion. Symposium on Geometric Processing (2009). A. Sharma & R. Horaud. Shape matching based on diffusion embedding and on mutual isometric consistency. NORDIA workshop (2010).

The Adjacency Matrix of a Graph (from Lecture #3) For a graph with n vertices and with binary edges, the entries of the n n adjacency matrix are defined by: A ij = 1 if there is an edge e ij A := A ij = 0 if there is no edge A ii = 0 A = 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0

The Walks of Length 2 The number of walks of length 2 between two graph vertices: Examples: A 2 = N 2 (v i, v j ) = A 2 (i, j) N 2 (v 2, v 3 ) = 1; N 2 (v 2, v 2 ) = 3 2 1 1 1 1 3 1 0 1 1 2 1 1 0 1 1

The Number of Walks of Length k Theorem The number of walks of length k joining any two nodes v i and v j of a binary graph is given by the (i, j) entry of the matrix A k : N k (v i, v j ) = A k (i, j) A proof of this theorem is provided in: Godsil and Royle. (2001). Algebraic Graph Theory. Springer.

The Similarity Matrix of an Undirected Weighted Graph We consider undirected weighted graphs; Each edge e ij is weighted by ω ij > 0. We obtain: Ω(i, j) = ω(v i, v j ) = ω ij Ω := Ω(i, j) = 0 Ω(i, i) = 0 if there is an edge e ij if there is no edge We will refer to this matrix as the base similarity matrix

Walks of Length 2 The weight of the walk from v i to v j through v l is defined as the product of the corresponding edge weights: ω(v i, v l )ω(v l, v j ) The sum of weigths of all walks of length 2: ω 2 (v i, v j ) = n ω(v i, v l )ω(v l, v j ) = Ω 2 (i, j) l=1

Associated Feature Space Let Ω = UΛU, i.e., the spectral decomposition of a real symmetric matrix. We have Ω 2 = UΛ 2 U is symmetric semi-definite positive, hence it can be interpreted as a kernel matrix, with: ω 2 (v i, v j ) = n ω(v i, v l )ω(v l, v j ) l=1 Associated feature space: = (ω(v i, v l )) n l=1, (ω(v j, v l )) n l=1 = κ(v i, v j ) φ : v i φ(v i ) = (ω(v i, v 1 )... ω(v i, v l )... ω(v i, v n )) It is possible to enhance the base similarity matrix by linking vertices along walks of length 2.

Combining Powers of the Base Similarity Matrix More generally one can take powers of the base similarity matrix as well as linear combinations of these matrices: α 1 Ω +... + α k Ω k +... + α K Ω K The eigenvalues of such a matrix must be nonnegative to satisfy the kernel condition: α 1 Λ +... + α k Λ k +... + α K Λ K 0

The Exponential Diffusion Kernel Consider the following combination of powers of the base similarity matrix: µ k K = k! Ωk This corresponds to: k=0 K = e µω where µ is a decay factor chosen such that the influence of longer walks decreases, since they are less reliable. Spectral decomposition: K = Ue µλ U Hence K is a kernel matrix since its eigenvalues e µλ i 0.

Heat Diffusion on a Graph The ( heat-diffusion equation on a Riemannian manifold: t + ) M f(x; t) = 0 M denotes the geometric Laplace-Beltrami operator. f(x; t) is the distribution of heat over the manifold at time t. By extension, t + M can be referred to as the heat operator [Bérard et al. 1994]. This equation can also be written on a graph: ( ) t + L F (t) = 0 where the vector F (t) = (F 1 (t)... F n (t)) is indexed by the nodes of the graph.

The Fundamental Solution The fundamental solution of the (heat)-diffusion equation on Riemannian manifolds holds in the discrete case, i.e., for undirected weighted graphs. The solution in the discrete case is: F (t) = H(t)f where H denotes the discrete heat operator: H(t) = e tl f corresponds to the initial heat distribution: F (0) = f Starting with a point-heat distribution at vertex v i, e.g., (0... f i = 1... 0), the distribution at t, i.e., F (t) = (F 1 (t)... F n (t)) is given by the i-th column of the heat operator.

How to Compute the Heat Matrix? The exponential of a matrix: Hence: e A = e tl = k=0 A k k! ( t) k k=0 It belongs to the exponential diffusion family of kernels just introduced with µ = t, t > 0. k! L k

Spectral Properties of L We start by recalling some basic facts about the combinatorial graph Laplacian: Symmetric semi-definite positive matrix: L = UΛU Eigenvalues: 0 = λ 1 < λ 2... λ n Eigenvectors: u 1 = 1, u 2,..., u n λ 2 and u 2 are the Fiedler value and the Fiedler vector u i u j = δ ij u i>11 = 0 n i=1 u ik = 0, k {2,..., n} 1 < u ik < 1, i {1,..., n}, k {2,..., n} L = n λ k u k u k k=2

The Heat-kernel Matrix with: H(t) = e tuλu = Ue tλ U 0 e tλ = Diag[e tλ 1... e tλn ] Eigenvalues: 1 = e t0 > e tλ 2... e tλn Eigenvectors: same as the Laplacian matrix with their properties (previous slide). The heat trace (also referred to as the partition function): n Z(t) = tr(h) = e tλ k The determinant: det(h) = k=1 n e tλ k = e ttr(l) = e tvol(g) k=1

The Heat-kernel Computing the heat matrix: H(t) = n e tλ k u k u k k=2 where we applied a deflation to get rid of the constant eigenvector: H H u 1 u 1 The heat kernel (en entry of the matrix above): h(i, j; t) = n e tλ k u ik u jk k=2

Feature-space Embedding Using the Heat Kernel H(t) = (Ue 1 tλ) ( 2 Ue 1 tλ) 2 Each row of the n n matrix Ue tλ/2 can be viewed as the coordinates of a graph vertex in a feature space, i.e., the mapping φ : V R n 1, x i = φ(v i ): x i = ( e 1 2 tλ 2 u i2... e 1 2 tλ k u ik... e 1 2 tλn u in ) = (x i2... x ik... x in ) The heat-kernel computes the inner product in feature space: h(i, j; t) = φ(v i ), φ(v j )

The Auto-diffusion Function Each diagonal term of the heat matrix corresponds to the norm of a feature-space point: h(i, i; t) = n e tλ k u 2 ik = x i 2 k=2 This is also known as the auto-diffusion function (ADF), or the amount of heat that remains at a vertex at time t. The local maxima/minima of this function have been used for a feature-based scale-space representation of shapes. Associated shape descriptor: v i h(i, i; t) hence it is a scalar function defined over the graph.

The ADF as a Shape Descriptor t = 0 t = 50 t = 500000

Spectral Distances The heat distance: d 2 t (i, j) = h(i, i; t) + h(j, j; t) 2h(i, j; t) = n (e 1 2 tλ k (u ik u jk )) 2 k=2 The commute-time distance: d 2 CTD (i, j) = = t=0 k=2 n k=2 n (e 1 2 tλ k (u ik u jk )) 2 dt ( λ 1/2 k (u ik u jk ) ) 2

Principal Component Analysis The covariance matrix in feature space: C X = 1 n n (x i x)(x i x) i=1 With: X = ( Ue 1 2 tλ) = [x1... x i... x n ] Remember that each column of U sums to zero. 1 < e 1 2 tλ k < x ik < e 1 2 tλ k < 1, 2 k n

Principal Component Analysis: The Mean x = 1 n n i=1 x i = 1 n e 1 2 tλ = 0. 0 n i=1 u i2. n i=1 u in

Principal Component Analysis: The Covariance C X = 1 n n x i x i i=1 = 1 n XX = 1 (Ue 1 tλ) 2 (Ue 1 tλ) 2 n = 1 n e tλ

Result I: The PCA of a Graph The eigenvectors (of the combinatorial Laplacian) are the principal components of the heat-kernel embedding: hence we obtain a maximum-variance embedding The associated variances are e tλ 2 /n,..., e tλn /n. The embedded points are strictly contained in a hyper-parallelepipedon with volume n i=2 e tλ i.

Dimensionality Reduction (1) Dimensionality reduction consists in selecting the K largest eigenvalues, K < n, conditioned by t, hence the criterion: choose K and t, such that (scree diagram): α(k) = K+1 i=2 e tλ i /n n i=2 e tλ i /n 0.95 This is not practical because one needs to compute all the eigenvalues.

Dimensionality Reduction (2) An alternative possibility is to use the determinant of the covariance matrix, and to choose the first K eigenvectors such that (with α > 1): which yields: α(k) = t α(k) = ln ( tr(l) K+1 i=2 e tλ i /n n i=2 e tλ i /n K+1 i=2 This allows to choose K for a scale t. λ i ) + (n K) ln n

Normalizing the Feature-space Observe that the heat-kernels collapse to 0 at infinity: lim t h(i, j; t) = 0. To prevent this problem, several normalizations are possible: Trace normalization Unit hyper-sphere normalization Time-invariant embedding

Trace Normalization Observe that lim t h(i, j; t) = 0 Use the trace of the operator to normalize the embedding: x i = x i Z(t) with: Z(t) K+1 k=2 e tλ k the k-component of the i-coordinate writes: ˆx ik (t) = ( e tλ k u 2 ik) 1/2 ( K+1 ) 1/2 l=2 e tλ l At the limit: x i (t ) = ( ui2 m... u i m+1 m 0... 0 ) where m is the multiplicity of the first non-null eigenvalue.

Unit Hyper-sphere Normalization The embedding lies on a unit hyper-sphere of dimension K: x i = x i x i The heat distance becomes a geodesic distance on a spherical manifold: d S (i, j; t) = arccos x i x j = arccos h(i, j; t) (h(i, i; t)h(j, j; t)) 1/2 At the limit (m is the multiplicity of the largest non-null eigenvalue): x i (t ) = ( u i2 ( P m+1 l=2 u2 il) 1/2... u i m+1 ( P m+1 l=2 u2 il) 1/2 0... 0 )

Time-invariant Embedding Integration over time: L = = 0 n k=2 H(t) = 0 n e tλ k u k u k dt k=2 1 λ k u k u k = UΛ U with: Λ = Diag[λ 1 2,..., λ 1 n ]. Matrix L is called the discrete Green s function [ChungYau2000], the Moore-Penrose pseudo-inverse of the Laplacian. ( ) Embedding: x i = λ 1/2 2 u i2... λ 1/2 K+1 u i K+1 Covariance: C X = 1 n Diag[λ 1 2,..., λ 1 K+1 ]

Examples of Normalized Embeddings t = 50 t = 5000 t = 500000

Shape Matching (1) t = 200, t = 201.5 t = 90, t = 1005

Shape Matching (2)

Shape Matching (3)

Sparse Shape Matching Shape/graph matching is equivalent to matching the embedded representations [Mateus et al. 2008] Here we use the projection of the embeddings on a unit hyper-sphere of dimension K and we apply rigid matching. How to select t and t, i.e., the scales associated with the two shapes to be matched? How to implement a robust matching method?

Scale Selection Let C X and C X be the covariance matrices of two different embeddings X and X with respectively n and n points: det(c X ) = det(c X ) det(c X measures the volume in which the embedding X lies. Hence, we impose that the two embeddings are contained in the same volume. From this constraint we derive: t tr(l ) = t tr(l) + K log n/n

Robust Matching Build an association graph. Search for the largest set of mutually compatible nodes (maximal clique finding). See [Sharma and Horaud 2010] (Nordia workshop) for more details. i, i i, j i, l j, j l, l j, k k, k