A Gaussian Type Kernel for Persistence Diagrams

Size: px

Start display at page:

Download "A Gaussian Type Kernel for Persistence Diagrams"

Wendy Porter
5 years ago
Views:

1 TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi

2 Persistence diagrams point cloud 1-parameter filtration (by scale) barcode / diagram 2

3 Persistence diagrams Def: A finite multiset in the open half-plane R + (2) (1) (1) 2

4 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : cost of a matched pair (x, y) M: c p (x, y) := x y p cost of an unmatched point z X Y : c p (z) := z z p cost of M: c p (M) := (x, y) matched c p (x, y) + z unmatched c p (z) 1/p (2) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2

5 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : Prop.: [Cohen-Steiner, Edelsbrunner, Harer 2007] cost of a matched pair (x, y) M: c p (x, y) := x y p d p (D, D ) 2 1/p z z d (D, D ) 1 1/p cost of an unmatched point z z X Y X Y : c p (z) := z z p cost of M: c p (M) := (x, y) matched c p (x, y) + z unmatched c p (z) 1/p (2) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2

6 Persistence diagrams Def: A finite multiset in the open half-plane R + Given a partial matching M : X Y : Prop.: [Cohen-Steiner, Edelsbrunner, Harer 2007] cost of a matched pair (x, y) M: c p (x, y) := x y p d p (D, D ) 2 1/p z z d (D, D ) 1 1/p cost of an unmatched point z z X Y X Y : c p (z) := z z p cost of M: c p (M) := similar to Wasserstein distances between probability measures... c p (x, y) + c p (z) (x, y) matched z unmatched (2) 1/p... but not quite the same (different masses, diagonal) p-th diagram distance (extended metric): z x y d p (X, Y ) := inf c p(m) M:X Y z bottleneck distance: d (X, Y ) := lim p d p(x, Y ) 2

7 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile 0-dimensional homology generators 1-dimensional homology generators 2

8 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile 0-dimensional homology generators 1-dimensional homology generators Cons: this implies the space that oflinear persistence classifiers diagrams cannotisbe not used a linear PDs space have been mostly used in unsuper bad for learning and statistics having descriptors long construction can be slow times to is compute fine as long and (more as comparisons importantly) canto becompare made fast, but here t bad for applications 2

9 Persistence diagrams as descriptors for data this diagram can serve as a de input data domain / filter persistence diagram Pros: topological descriptors carry information of a different nature strong invariance and stability properties flexible and versatile Cons: 0-dimensional homology generators 1-dimensional homology generators A solution: Hilbert space embedding this one possible solution among ot this implies the space that oflinear persistence classifiers diagrams cannotisbe not used a linear PDs space have been mostly used in unsuper bad for learning and statistics having descriptors long construction can be slow times to is compute fine as long and (more as comparisons importantly) canto becompare made fast, but here t bad for applications 2

10 Hilbert space embedding and kernel trick 1 data -1 class +1 class 1.5 svm classifier -1 (training) +1 (training) Support Vectors convex optimization pb. for training, immediate comput. for testing 3

11 Hilbert space embedding and kernel trick svm classifier polynomial (training) +1 (training) Support Vectors misclassified -1.5 feature -1 space (training) +1 (training) Support Vectors misclassified

12 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem X φ (H,, H ) 4

13 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem observation: many learning methods only use the inner product do not lift the data, instead compute the k(x, y) = φ(x), φ(y) H X φ (H,, H ) 4

14 Hilbert space embedding and kernel trick X : a space in which we want to compare/classify elements feature map φ : X H equipped with inner product, H lift training/testing data to H through φ then solve learning problem observation: many learning methods only use the inner product do not lift the data, instead compute the k(x, y) = φ(x), φ(y) H Def.: A reproducing kernel is a map k : X X R such that k(, ) = φ( ), φ( ) H for some pair (φ, H). Thm.: [Moore, Aronszajn] A pair (φ, H) exists (and is unique) whenever k is positive semidefinite, i.e. for all n N and x 1,, x n X the Gram matrix (k(x i, x j )) i,j is positive semidefinite. H is called the Reproducing Kernel Hilbert Space (RKHS) of X. 4

15 Hilbert space embedding and kernel trick classification / regression: input: (x 1, y 1 ),, (x n, y n ) X Y the regularization term min here isl to(y avoid i, w, the x i + optimal b) + γ hyperplane w p to make use of t w, b 1 n n i=1 (e.g. L(y, t) = max{0, 1 t.y} for SVMs, p = 1 for sparsity) 1 data -1 class +1 class 1.5 svm classifier -1 (training) +1 (training) Support Vectors

16 Hilbert space embedding and kernel trick classification / regression: input: (x 1, y 1 ),, (x n, y n ) X Y the regularization term min here isl to(y avoid i, w, the x i + optimal b) + γ hyperplane w p to make use of t w, b 1 n n i=1 φ(x i ) (e.g. L(y, t) = max{0, 1 t.y} for SVMs, p = 1 for sparsity) Thm. (Representer): [Schölkopf, Herbrich, Smola] The argmin w admits a representation of the form: thanks to the representer thm, we can restrict the opimization to the linear subsp w = n α j φ(x j ) = n α j k(x j, ) j=1 j=1 search argmin within span{φ(x 1 ),, φ(x n )} and optimize for the α i 5

17 Hilbert space embedding and kernel trick Examples: - linear kernel: k(x, y) = x, y - polynomial kernel: k(x, y) = (α x, y + 1) β ( ) x y 2 - Gaussian kernel: k(x, y) = exp 2 2σ 2 i.e. ones that depend on the distance between the observations many interesting properties, in particular: - infinite-dimensiona based learning in practice) - (the swiss knife for kernel 6

18 Hilbert space embedding and kernel trick Examples: - linear kernel: k(x, y) = x, y - polynomial kernel: k(x, y) = (α x, y + 1) β ( ) x y 2 - Gaussian kernel: k(x, y) = exp 2 2σ 2 Thm.: [Berg, Christensen, Ressel 1984] ypically, the Euclidean distance and its powers up to 2 (included) are cnsd If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, n α i = 0 = n n α i α j d(x i, x j ) 0, then k(x, y) := exp ( ) d(x,y) 2σ 2 i=1 i=1 j=1 is positive semidefinite. 6

19 Hilbert space embedding and kernel trick Q: does this apply to persistence diagrams? Pb: d p is not cnsd, for any p > 0 Thm.: [Berg, Christensen, Ressel 1984] ypically, the Euclidean distance and its powers up to 2 (included) are cnsd If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, n α i = 0 = n n α i α j d(x i, x j ) 0, then k(x, y) := exp ( ) d(x,y) 2σ 2 i=1 i=1 j=1 is positive semidefinite. 6

20 Kernels for persistence diagrams View persistence diagrams as: landscapes (collections of 1-d functions) [Bubenik 2012] [Bubenik, D lotko 2015] a landscapes is a polyline of slopes ±1 discrete measures: histogram [Bendich et al. 2014] convolution with fixed kernel [Chepushtanova et al. 2015] if you do the convolution naively as in [Chepushtanova et al. 2015] you don t get convolution with weighted kernel [Kusano, Fukumisu, Hiraoka ] heat diffusion [Reininghaus et al. 2015] + exponential [Kwit et al. 2015] finite metric spaces [C., Oudot, Ovsjanikov 2015] polynomial roots or evaluations [Di Fabio, Ferri 2015] [Kališnik 2016] 7

21 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7

22 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7

23 Kernels for persistence diagrams L 2 (N R) is defined by taking the product of the counting measure on N and th positive (semi-)definiteness Attention: L 2 is not an RKHS, just an ambient Hilbert space in which th ambient Hilbert space φ( ) φ( ) H f(d p ) injectivity landscapes discrete measures L 2 (N R) L 2 (R 2 ) (R d, 2 ) metric spaces polynomials l 2 (R) φ( ) φ( ) H g(d p )??? algorithmic cost O(n 2 ) O(n 2 ) f. map: O(n 2 ) kernel: O(d) f. map: O(nd) kernel: O(d) universality additivity 7

24 Kernels for persistence diagrams View persistence diagrams as: landscapes (collections of 1-d functions) [Bubenik 2012] [Bubenik, D lotko 2015] a landscapes is a polyline of slopes ±1 discrete measures: histogram [Bendich et al. 2014] convolution with fixed kernel [Chepushtanova et al. 2015] if you do the convolution naively as in [Chepushtanova et al. 2015] you don t get convolution with weighted kernel [Kusano, Fukumisu, Hiraoka ] heat diffusion [Reininghaus et al. 2015] + exponential [Kwit et al. 2015] finite metric spaces [C., Oudot, Ovsjanikov 2015] polynomial roots or evaluations [Di Fabio, Ferri 2015] [Kališnik 2016] 7

25 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x measure birth µ D := x D δ x birth 8

26 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure birth birth birth µ D := x D δ x µ w D := x D w(x)δ x Pb: µ D is unstable (points on diagonal disappear) w(x) := arctan (c d(x, ) r ), c, r > 0 8

27 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure convolution birth birth birth µ D := x D δ x Pb: µ D is unstable (points on diagonal disappear) w(x) := arctan (c d(x, ) r ), c, r > 0 µ w D := x D w(x)δ x µ w D := µ w D N (0, σ) Def: φ(d) is the density function of µ w D N (0, σ) w.r.t. Lebesgue measure: φ(d) := 1 arctan(c d(x, ) r ) exp ( ) x 2 2πσ 2σ 2 x D k(d, D ) := φ(d), φ(d ) L2 ( R + ) 8

Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death

28 Persistence diagrams as discrete measures (I) treat diagrams as measures and to take their densities as feature vectors (to build the feature map, from which the kernel itself is then derived) death D discrete death µ D δx x weighting death measure convolution birth birth birth µ D := x D δ x µ w D := x D w(x)δ x µ w D := µ w D N (0, σ) Prop.: [Kusano, Fukumisu, Hiraoka ] φ(d) φ(d ) H C d p (D, D ). φ is injective and exp(k) is universal... because Pb: convolution points/modes reduces start discriminativity mixing up together. use discrete Weighting measure also reduces insteaddiscr φ(d) := 1 2πσ x D arctan(c d(x, ) r ) exp ( x 2 2σ 2 ) k(d, D ) := φ(d), φ(d ) L2 ( R + ) 8

29 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O 9

30 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O given D, D, let µ D := x D µ D := δ x + y D δ y + y D δ π (y) x D δ π (x) this Then, is a quasi-isometric d p (D, D ) Wembedding p ( µ D, µ D ) 2 d p (D, D ) 9

31 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O given D, D, let µ D := x D µ D := δ x + y D δ y + y D δ π (y) x D δ π (x) this Then, is a quasi-isometric d p (D, D ) Wembedding p ( µ D, µ D ) 2 d p (D, D ) Pb: µ D depends on D This is bad in practice because, basically, you need to make µ D 9

32 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero 9

33 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea δx death death x π (x) birth birth µ D := x D δ x Pb: d p (D, D ) W p (µ D, µ D ) (W p does not even make sense here) this is because the two discrete measures have different masses. O Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero Q: What the Wasserstein metric should metric we is use? certain A: Indeed, Kantorovich the Kantorovich norm K norm 9

34 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N Solution: transfer mass negatively to µ D : µ D := x D δ x x D δ π (x) M 0 (R 2 ) this is the space of measur signed disrete measure of total mass zero Q: What the Wasserstein metric should metric we is use? certain A: Indeed, Kantorovich the Kantorovich norm K norm 9

35 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N B Σ, let µ + (B) := µ(b P ) and µ (B) := µ(bµ + = positive N) M part of µ, and µ + (X) Def.: µ K := W 1 (µ +, µ ) Prop.: µ, ν M 0 (X), W 1 (µ + + ν, ν + + µ ) = µ ν K 9

36 Persistence diagrams as discrete measures (II) g densities might be the reason for the lack of discriminativity, since this requires to make some compromises. instead, now we deal with the discrete mea Hahn decomposition thm.: For any µ M 0 (X, Σ), there exist measurable sets P, N such that: (i) P N = X and P N = (ii) µ(b) 0 for every measureable set B P (iii) µ(b) 0 for every measureable set B N Moreover, the decomposition is essentially unique. P P is a positive set for µ N is a negative set for µ unique in the sense that for any N B Σ, let µ + (B) := µ(b P ) and µ (B) := µ(bµ + = positive N) M part of µ, and µ + (X) Def.: µ K := W 1 (µ +, µ ) Prop.: µ, ν M 0 (X), W 1 (µ + + ν, ν + + µ ) = µ ν K for persistence diagrams: W 1 ( µ D, µ D ) = µ D µ D K µ D µ D µ D µ D now we can identify the space of PDs to a space 9

37 A Wasserstein Gaussian kernel for PDs? Thm.: [Berg, Christensen, Ressel 1984] If d : X X R + symmetric is conditionally negative semidefinite, i.e.: n N, x 1,, x n X, then k(x, y) := exp ( ) d(x,y) 2σ 2 n n n α i = 0 = i=1 i=1 j=1 is positive semidefinite. α i α j d(x i, x j ) 0, Pb: W 1 is not cnsd, it neither is indeed is d 1 easy to generate counterexamples by randomly sam Solutions: relax the measures (e.g. convolution) this is the path taken by [Aguey, Carlier] and [Oh relax the metric (e.g. regularization, slicing) 10

38 Sliced Wasserstein metric Special case: X = R, µ, ν discrete measures of mass n µ := n i=1 δ x i, ν := n i=1 δ y i Sort the atoms of µ, ν along the real line: onex i can xthen i+1 and see Py i and y i+1 Q as forn-dimensiona all i Then: W 1 (µ, ν) = n i=1 x i y i = (x 1,, x n ) (y 1,, y n ) 1 µ ν 11

39 Sliced Wasserstein metric Special case: X = R, µ, ν discrete measures of mass n µ := n i=1 δ x i, ν := n i=1 δ y i Sort the atoms of µ, ν along the real line: onex i can xthen i+1 and see Py i and y i+1 Q as forn-dimensiona all i Then: W 1 (µ, ν) = n i=1 x i y i = (x 1,, x n ) (y 1,, y n ) 1 µ ν W 1 is cnsd and easy to compute (same with K for signed measures) 11

40 Sliced Wasserstein metric Def (sliced Wasserstein distance): for µ, ν M + (R 2 ), SW 1 (µ, ν) := 1 W 1 (π θ #µ, π θ #ν) dθ 2π θ S 1 Note : une idee naturelle serait d integrer sur toutes les droites du plan, soit RP 1 where π θ = orthogonal projection onto line passing through origin with angle θ. θ 11

41 Sliced Wasserstein metric Def (sliced Wasserstein distance): for µ, ν M + (R 2 ), SW 1 (µ, ν) := 1 W 1 (π θ #µ, π θ #ν) dθ 2π θ S 1 Note : une idee naturelle serait d integrer sur toutes les droites du plan, soit RP 1 where π θ = orthogonal projection onto line passing through origin with angle θ. Props: (inherited from W 1 over R) [Rabin, Peyré, Delon, Bernot 2011] - satisfies the axioms of a metric - well-defined barycenters, fast to compute via stochastic gradient descent, etc. - conditionally negative semidefinite 11

42 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. (from SW cnsd and Berg s theorem) 12

43 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. application to persistence diagrams: (from SW cnsd and Berg s theorem) death δx x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) 2σ 2 12

44 Sliced Wasserstein kernel Def: Given σ > 0, for any µ, ν M + (R 2 ): k SW (µ, ν) := exp ( SW ) 1(µ, ν) 2σ 2 Corllary: [Kolouri, Zou, Rohde] k SW is positive semidefinite. application to persistence diagrams: (from SW cnsd and Berg s theorem) death δx x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) - positive semidefinite 2σ 2 - simple and fast (vector compute norm + d 12

45 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) δx application to persistence diagrams: death x D µ D := x D δ x π (x) µ D := µ D π #µ D birth SW 1 (D, D ) := this π θ # µ the D same π θ as # µ SW D 1 (µ K dθ D + π #µ D, µ D + π #µ D ) θ S 1 ( k SW (D, D ) := exp SW ) 1(D, D ) 2σ 2 12

46 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) Corollary: the feature map φ associated with k SW is weakly metric-preserving: g, h nonzero except at 0 such that g d 1 φ( ) φ( ) H h d 1. 12

47 Sliced Wasserstein kernel Thm.: [C., Cuturi, Oudot] The metrics d 1 and SW 1 on the space D N of persistence diagrams of size bounded by N are strongly equivalent, namely: for D, D D N, N(2N 1) d 1(D, D ) SW 1 (D, D ) 2 2 d 1 (D, D ) Corollary: the feature map φ associated with k SW is weakly metric-preserving: g, h nonzero except at 0 such that g d 1 φ( ) φ( ) H h d 1. Thm.: [C., Bauer] Any feature map φ : D N R d cannot preserve metrics, whatever d and N are, i.e. either lower bound is 0 or upper bound is + 12

48 Metric distortion in practice 13

49 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): This is a discrete dynamical system. The behavior of { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Label = 1 Label = 5 Label = 2 Label = 3 Label = 4 14

50 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Accuracies (%) using only TDA descriptors (kernels on barcodes): k PSS k PWG k SW Orbit 64.0 ± ± ± 1.1 (PDs as discrete measures) 14

51 Application to supervised orbits classification Goal: classify orbits of linked twisted map, modelling fluid flow dynamics Orbits described by (depending on parameter r): { xn+1 = x n + r y n (1 y n ) mod 1 y n+1 = y n + r x n+1 (1 x n+1 ) mod 1 Accuracies (%) using only TDA descriptors (kernels on barcodes): k PSS k PWG k SW Orbit 64.0 ± ± ± 1.1 (PDs as discrete measures) Running times (in seconds on N-sized parameter space from 100 orbits): k PSS k PWG k SW Orbit N ± 65.6 N 69.2 ± ± NC (φ( ) recomputed for each σ) (SW 1 computed only once) 14

52 Application to supervised shape segmentation Goal: segment 3d shapes based on examples Approach: - train a (multiclass) classifier on PDs extracted from the training shapes - apply classifier to PDs extracted from query shape Head Torso Label =? Foot Hand Training Test 15

53 Application to supervised shape segmentation Goal: segment 3d shapes based on examples Approach: - train a (multiclass) classifier on PDs extracted from the training shapes - apply classifier to PDs extracted from query shape Accuracies (%) using TDA descriptors (kernels on barcodes): Human Airplane Ant FourLeg Octopus Bird Fish kpss 68.5 ± ± ± ± ± ± ± 1.6 kpwg 64.2 ± ± ± ± ± ± ± 0.5 ksw 74.0 ± ± ± ± ± ± ±

54 Recap input data domain / filter persistence diagram kernels for persistence diagrams: - stable 0-dimensional homology generators 1-dimensional homology generators - additive, universal, etc. - easy to compute - distance-preserving (approximately) - (approximate) Gaussian kernel δx x π (x) signed measure 16

Recap input data domain / filter persistence diagram theoretical framework: - quasi-isometric embedding of diagrams as signed measures 0-dimensional

55 Recap input data domain / filter persistence diagram theoretical framework: - quasi-isometric embedding of diagrams as signed measures 0-dimensional homology generators 1-dimensional homology generators - Wasserstein metric relaxation (sliced Wasserstein) cnsd δx x Thank you!! π (x) signed measure 16

Learning from persistence diagrams

Learning from persistence diagrams Ulrich Bauer TUM July 7, 2015 ACAT final meeting IST Austria Joint work with: Jan Reininghaus, Stefan Huber, Roland Kwitt 1 / 28 ? 2 / 28 ? 2 / 28 3 / 28 3 / 28 3 / 28