A Kernel on Persistence Diagrams for Machine Learning

Similar documents
Learning from persistence diagrams

A Gaussian Type Kernel for Persistence Diagrams

CIS 520: Machine Learning Oct 09, Kernel Methods

Kernels and the Kernel Trick. Machine Learning Fall 2017

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Kernel methods for comparing distributions, measuring dependence

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

CS798: Selected topics in Machine Learning

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Review: Support vector machines. Machine learning techniques and image analysis

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Kernel methods, kernel SVM and ridge regression

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

The Representor Theorem, Kernels, and Hilbert Spaces

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Statistical learning theory, Support vector machines, and Bioinformatics

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Kernel Method: Data Analysis with Positive Definite Kernels

Support Vector Machines

A Generalized Kernel Approach to Dissimilarity-based Classification

Statistical Machine Learning

Support Vector Machine (SVM) and Kernel Methods

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Support Vector Learning for Ordinal Regression

Machine Learning. Support Vector Machines. Manfred Huber

Linking non-binned spike train kernels to several existing spike train metrics

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Reproducing Kernel Hilbert Spaces

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Support Vector Machine (SVM) and Kernel Methods

Kernel methods CSE 250B

Holdout and Cross-Validation Methods Overfitting Avoidance

Kernel Methods. Barnabás Póczos

Support Vector Machines

Convex Optimization Algorithms for Machine Learning in 10 Slides

Lecture 7: Kernels for Classification and Regression

1 Machine Learning Concepts (16 points)

Data dependent operators for the spatial-spectral fusion problem

Machine Learning : Support Vector Machines

Joint distribution optimal transportation for domain adaptation

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

ML (cont.): SUPPORT VECTOR MACHINES

Basis Expansion and Nonlinear SVM. Kai Yu

Distance Metric Learning

Sparse Kernel Machines - SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Advanced Machine Learning & Perception

Perceptron Revisited: Linear Separators. Support Vector Machines

Discriminative Learning and Big Data

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Distributions of Persistence Diagrams and Approximations

Support Vector Machine (SVM) and Kernel Methods

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

L5 Support Vector Classification

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Principal Component Analysis

Kernel Methods. Outline

Kernel Methods in Machine Learning

Support Vector Machine (continued)

Reproducing Kernel Hilbert Spaces

Classification on Pairwise Proximity Data

Data Mining Techniques

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Structured Sparse Principal Component Analysis

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

5.6 Nonparametric Logistic Regression

CITS 4402 Computer Vision

Kernel-based Feature Extraction under Maximum Margin Criterion

(i) The optimisation problem solved is 1 min

Kaggle.

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

What is semi-supervised learning?

Data Preprocessing Tasks

Support Vector Machine

Bits of Machine Learning Part 1: Supervised Learning

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Topological Simplification in 3-Dimensions

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Bayesian Linear Regression. Sargur Srihari

Kronecker Decomposition for Image Classification

Support Vector Machines: Kernels

arxiv: v1 [cs.cv] 13 Jul 2017

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

MATHEMATICAL entities that do not form Euclidean

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Multiple Kernel Learning

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machines and Kernel Methods

Machine Learning Practice Page 2 of 2 10/28/13

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech

Transcription:

A Kernel on Persistence Diagrams for Machine Learning Jan Reininghaus 1 Stefan Huber 1 Roland Kwitt 2 Ulrich Bauer 1 1 Institute of Science and Technology Austria 2 FB Computer Science Universität Salzburg, Austria Toposys Annual Meeting IST Austria September 9, 2014 Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams 1 of 25

Pipeline in topological machine learning Task: shape retrieval Task: object recognition Task: texture recognition Topological data analysis Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25

Pipeline in topological machine learning Task: shape retrieval SVM PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25

Pipeline in topological machine learning Task: shape retrieval SVM Kernel PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning We want a robust, multi-scale kernel on the set of persistence diagrams! Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Introduction 2 of 25

Background on kernels Definition Given a non-empty set X, a (positive definite) kernel is a symmetric, positive-definite function k : X X R, i.e. k(x, x ) = k(x, x) x, x X (1) [k(x i, x j )] n,n i,j=1,1 is pos. semi-def. x 1,..., x n X, n > 0. (2) Example: An inner product on any vector space V is a kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 3 of 25

Background on kernels Theorem For any kernel k on a set X there exists a Hilbert space H and a mapping ϕ: X H with k(x, x ) = (ϕ(x), ϕ(x )) H x, x X. (3) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 4 of 25

Background on kernels: Examples Given a Vector space V with inner product (, ). k(x, x ) = (x, x ) is a kernel. k(x, x ) = (x, x ) d, with d 0, is a kernel. k(x, x ) = e λ x x 2 is a kernel for all λ > 0. Radial basis function kernel (RBF-kernel). Let k 1,..., k m be kernels on a set X. m k=1 λ ik i, with λ i 0, is a kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 5 of 25

Background on kernels: Optimal assignment kernel Goal: A kernel on the set of finite point sets in R 2. Definition Let X, Y be two finite point sets in R 2. The optimal assignment kernel is given as {max π : X Y x X k(x, π(x)) for X Y k A (X, Y ) = max π : Y X y Y k(π(y), y) otherwise, (4) with k(, ) being a positive definite kernel for points in R 2. The optimal assignment kernel with k(x, y) = e λ x y 2 positive definite, i.e., not a kernel. and λ > 0 is not Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Kernels 6 of 25

Hilbert-space embedding Goal: A kernel on persistence diagrams, i.e., on finite multi-sets of points in R 2. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 7 of 25

Hilbert-space embedding Goal: A kernel on persistence diagrams, i.e., on finite multi-sets of points in R 2. Goal: An embedding Ψ: D H from the set D of persistence diagrams into a Hilbert space H. The inner product of H will serve as kernel: with F, G D. k(f, G) = (Ψ(F ), Ψ(G)) H Or we use the RBF-kernel on H, or a linear-combination of kernels, or... Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 7 of 25

Embedding a multi-set of points Given a persistence diagram D: Replace each point p D with a Dirac delta δ p centered at p: D H 2 (R 2 ): D p D δ p However, the induced metric in H 2 (R 2 ) is not stable. We apply heat diffusion with p D δ p as initial condition and Dirichlet boundary conditions at the diagonal. Solutions are in L2(R 2 ). Claim: Solutions are stable. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Hilbert space 8 of 25

Heat diffusion: Definition Let Ω = {(x, y) R 2 : y x}. Heat-diffusion PDE For a given diagram D we consider the solution u : Ω R 0 R of xx u + yy u = t u in Ω R >0, (5) u = 0 on Ω R 0, (6) u = δ p p D on Ω {0}. (7) At scale σ > 0, the embedding of diagrams is given as: Ψ σ (D) is in L 2 (Ω), that is, Ψ σ is an L 2 -embedding. Ψ σ (D) = u(,, σ) (8) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 9 of 25

Heat diffusion: The solution Heat-diffusion PDE For a given diagram D we consider the solution u : Ω R 0 R of xx u + yy u = t u in Ω R >0, u = 0 on Ω R 0, u = δ p on Ω {0}. p D Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 10 of 25

Heat diffusion: The solution Heat-diffusion PDE For a given diagram D we consider the solution u : R 2 R 0 R of xx u + yy u = t u in R 2 R >0, (9) u = p D δ p δ p on R 2 {0}. (10) Extend Ω to R 2 : Mirror and negate each δ p at diagonal. For p = (a, b), we set p = (b, a). Restricting its solution to Ω solves the original problem. Solution is given by convolution with Gaussian kernel. u(x, y, t) = 1 4πt p D e (x,y) p 2 4t e (x,y) p 2 4t (11) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 10 of 25

Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

Heat diffusion: The solution The L 2 -embedding at scale σ results in: Ψ σ (D): Ω R: (x, y) 1 4πσ p D e (x,y) p 2 4σ e (x,y) p 2 4σ (12) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 11 of 25

The inner product Given two diagrams F and G. We compute the inner product explicitly: (Ψ σ (F ), Ψ σ (G)) = Ψ σ (F )Ψ σ (G) Ω = 1 8πσ p F q G e p q 2 8σ e p q 2 8σ (13) O( F G ) time, no approximation. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Heat diffusion 12 of 25

Robustness Goal: We want to bound Ψ σ (F ) Ψ σ (G) L2 by W q (F, G). Observation A linear embedding Ψ: D V, with a normed vector space V, cannot be stable w.r.t W q for q {2,..., }. By linear we mean Ψ(F + G) = Ψ(F ) + Ψ(G) and Ψ( ) = 0. Set G = and F n = {u,..., u}. }{{} n times We have W q (F n, G) = min η : F n G q (u,v) η But Ψ(F n ) Ψ(G) V = n Ψ(F 1 ) V. u v q = q n W q (F 1, G). Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 13 of 25

Robustness Theorem For any two persistence diagrams F and G, and any σ > 0 we have Ψ σ (F ) Ψ σ (G) L2(Ω) 8 σ π W 1(F, G). (14) Multi-scale and noise: Noise on the input data causes W 1 (F, G) to increase. We can counteract by increasing σ. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 14 of 25

Robustness: Proof I Proof. Let η = arg min η : F G (u,v) η u v. Augmenting D with diagonal points leaves Ψσ(D) unchanged. We denote by N u (x) = 1 4πσ x u 2 e 4σ. Ψ σ (F ) Ψ σ (G) L2(Ω) = (N u N u ) (N v N v ) (u,v) η 2 N u N v L2(Ω) (u,v) η L2(Ω) Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 15 of 25

Robustness: Proof II Using N u N v L2(Ω) = 1 4πσ 1 e u v 2 2 8σ Ψ σ (F ) Ψ σ (G) L2(Ω) 1 πσ we have (u,v) η 1 e u v 2 2 8σ. By the convexity of x e x, we have e λx 1 λx, and hence Ψ σ (F ) Ψ σ (G) L2(Ω) 1 πσ 1 σ 8π (u,v) η 1 σ 8π W 1(F, G) 1 8σ u v 2 2 (u,v) η u v Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams The kernel / Robustness 16 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. λ k (x) death (a, b) x a b birth Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 1 a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 2 λ 1 a b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition Introduced by Bubenik [Bubenik, 2013]. death (a, b) λ k (x) λ 2 λ 1 a λ 3 b x Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 17 of 25

Persistence landscapes: Definition The persistence landscape of a persistence diagram can be interpreted as a function in L p (R 2 ). Let Ψ L : D L p (R 2 ): D λ(, ) denote this embedding. λ(k, x) 0 x 1 2 k... Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 18 of 25

Persistence landscapes: Stability Lemma (Bottleneck stability) For two persistence diagrams F and G it holds that Ψ L (F ) Ψ L (G) L (R 2 ) W (F, G) (15) Lemma (Weighted Wasserstein stability) For two persistence diagrams F and G it holds that Ψ L (F ) Ψ L (G) Lq(R 2 ) q min η : F G (u,v) η pers(u) u v q + 2 u v q+1 (16) q + 1 Ψ L : D L q (R 2 ) is not a linear embedding. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 19 of 25

Stability comparison: Experiment 1 Let G = and let F t = {( t, t)}, with t R 0. Ψ L (F t ) Ψ L (G) Lq t 1+ 1 q W q (F t, G) t 1 4πσ Ψ σ (F t ) Ψ σ (G) L2 1 e t2 8σ t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 20 of 25

Stability comparison: Experiment 2 Let F t = {( t, t)} and G t = {( t + 1, t + 1)}, with t R 0. Ψ L (F t ) Ψ L (G t ) Lq t 1 q 1 1 4πσ 1 e 2 8σ W q (F t, G t ) O(1) Ψ σ (F t ) Ψ σ (G t ) L2 1 t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 21 of 25

Stability comparison: Experiment 3 Persistence landscapes emphasize high-persistence features. Noise at high-persistence points can hide significant mid-range features. It becomes hard to distinguish between Class 1 and Class 2. Class 1 Class 2 t Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Persistence landscapes 22 of 25

Experiments: PSS- vs. Landscape-kernel SHREC 2014 dataset of shapes: 300 meshes, 20 classes (poses). Morse-filtration of heat-signature scalar field on each mesh. An additional scale parameter ( frequency ). 2 Tasks: Retrieval: Get the nearest neighbor in Hilbert space. Classification: Train a SVM with the kernel. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 23 of 25

Experiments: Retrieval 80 Frequency=6, Dimension=1 75 70 65 NN 60 55 50 45 PSS Landscape 40 1.00e-08 1.00e-07 1.00e-06 1.00e-05 1.00e-04 1.00e-03 1.00e-02 1.00e-01 Time Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 24 of 25

Experiments: Classification Freq. PSS [%] Landscapes [%] Diff. 1 95.67 ± 2.53 70.67 ± 3.65 +25.00 2 99.33 ± 0.91 90.67 ± 3.03 +8.67 3 96.67 ± 2.64 76.00 ± 3.84 +20.67 4 97.67 ± 1.49 85.33 ± 6.39 +12.33 5 97.33 ± 1.90 86.67 ± 2.04 +10.67 6 93.67 ± 3.21 71.67 ± 6.56 +22.00 7 89.33 ± 2.79 73.33 ± 7.17 +16.00 8 88.67 ± 5.58 81.33 ± 4.31 +7.33 9 90.67 ± 5.35 74.33 ± 7.32 +16.33 10 92.00 ± 3.21 59.67 ± 5.19 +32.33 11 92.00 ± 2.74 70.00 ± 6.56 +22.00 12 86.67 ± 2.64 61.67 ± 8.08 +25.00 13 86.67 ± 3.33 70.67 ± 6.93 +16.00 14 89.67 ± 2.98 58.67 ± 7.49 +31.00 15 87.33 ± 3.46 55.67 ± 8.04 +31.67 16 82.33 ± 4.18 55.67 ± 6.08 +26.67 17 86.00 ± 4.50 54.67 ± 5.70 +31.33 18 85.67 ± 5.08 52.33 ± 4.94 +33.33 19 79.00 ± 5.60 48.00 ± 5.19 +31.00 20 72.33 ± 2.53 43.00 ± 3.21 +29.33 Table : Comparison of the PSS- vs. Landscape-kernel using 5-fold cross-validation on the SHREC2014 (Synthetic) dataset for dimension 1. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 25 of 25

Bibliography I Bubenik, P. (2013). Statistical topological data analysis using persistence landscapes. arxiv, available at http://arxiv.org/abs/1207.6437. Jan Reininghaus, Stefan Huber, Roland Kwitt, Ulrich Bauer: A kernel on persistence diagrams Applications 26 of 25