Learning from persistence diagrams Ulrich Bauer TUM July 7, 2015 ACAT final meeting IST Austria Joint work with: Jan Reininghaus, Stefan Huber, Roland Kwitt 1 / 28
? 2 / 28
? 2 / 28
3 / 28
3 / 28
3 / 28
3 / 28
3 / 28
3 / 28
3 / 28
3 / 28
3 / 28
3 / 28
4 / 28
0.1 0.2 0.4 0.8 δ 4 / 28
0.1 0.2 0.4 0.8 δ 4 / 28
0.1 0.2 0.4 0.8 δ 4 / 28
0.1 0.2 0.4 0.8 δ 4 / 28
0.1 0.2 0.4 0.8 δ 4 / 28
0.1 0.2 0.4 0.8 δ 4 / 28
Homology inference Problem (Homological reconstruction) Given a finite sample P Ω, construct a shape X that is geometrically close to Ω and satisfies H (X) H (Ω). 5 / 28
Homology inference Problem (Homological reconstruction) Given a finite sample P Ω, construct a shape X that is geometrically close to Ω and satisfies H (X) H (Ω). Problem (Homology inference) Determine the homology H (Ω) of a shape Ω R d from a finite sample P Ω. 5 / 28
Homology reconstruction using union of balls B δ (P): δ-neighborhood of P Theorem (Niyogi, Smale, Weinberger 2006) Let Ω be a submanifold of R d. Let P Ω be such that Ω B δ (P) and δ satisfies a certain (strong) sampling condition δ < C(Ω). Then H (Ω) H (B δ (P)). 6 / 28
Homology reconstruction using union of balls B δ (P): δ-neighborhood of P Theorem (Niyogi, Smale, Weinberger 2006) Let Ω be a submanifold of R d. Let P Ω be such that Ω B δ (P) and δ satisfies a certain (strong) sampling condition δ < C(Ω). Then H (Ω) H (B δ (P)). 6 / 28
Homology reconstruction using union of balls B δ (P): δ-neighborhood of P Theorem (Niyogi, Smale, Weinberger 2006) Let Ω be a submanifold of R d. Let P Ω be such that Ω B δ (P) and δ satisfies a certain (strong) sampling condition δ < C(Ω). Then H (Ω) H (B δ (P)). 6 / 28
Homology reconstruction using union of balls B δ (P): δ-neighborhood of P Theorem (Niyogi, Smale, Weinberger 2006) Let Ω be a submanifold of R d. Let P Ω be such that Ω B δ (P) and δ satisfies a certain (strong) sampling condition δ < C(Ω). Then H (Ω) H (B δ (P)). 6 / 28
7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
0.2 0.4 0.6 0.8 1.0 1.2 7 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
Homology inference using persistent homology Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) Let Ω R d. Let P Ω be such that Then Ω B δ (P) for some δ > 0 (sampling density), P B є (Ω) for some є > 0 (sampling error), H (Ω B δ+є (Ω)) is an isomorphism, and H (B δ+є (Ω) B 2(δ+є) (Ω)) is a monomorphism. H (Ω) im H (B δ (P) B 2δ+є (P)). 8 / 28
0.2 0.4 0.6 0.8 1.0 1.2 9 / 28
The pipeline of topological data analysis Data point cloud distance Geometry function sublevel sets Topology topological spaces homology Algebra vector spaces barcode Combinatorics intervals 10 / 28
Stability of persistence barcodes for functions Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) If two functions f, g K R have distance f g δ then there exists a δ-matching of their barcodes. 11 / 28
Stability of persistence barcodes for functions Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) If two functions f, g K R have distance f g δ then there exists a δ-matching of their barcodes. matching of sets X, Y: bijection of subsets X X, Y Y 11 / 28
Stability of persistence barcodes for functions Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) If two functions f, g K R have distance f g δ then there exists a δ-matching of their barcodes. matching of sets X, Y: bijection of subsets X X, Y Y δ-matching of barcodes: 11 / 28
Stability of persistence barcodes for functions Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) If two functions f, g K R have distance f g δ then there exists a δ-matching of their barcodes. δ matching of sets X, Y: bijection of subsets X X, Y Y δ-matching of barcodes: matched intervals have endpoints within distance δ 11 / 28
Stability of persistence barcodes for functions Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) If two functions f, g K R have distance f g δ then there exists a δ-matching of their barcodes. δ 2δ matching of sets X, Y: bijection of subsets X X, Y Y δ-matching of barcodes: matched intervals have endpoints within distance δ unmatched intervals have length 2δ 11 / 28
Stability of persistence barcodes for functions Theorem (Cohen-Steiner, Edelsbrunner, Harer 2005) If two functions f, g K R have distance f g δ then there exists a δ-matching of their barcodes. δ 2δ matching of sets X, Y: bijection of subsets X X, Y Y δ-matching of barcodes: matched intervals have endpoints within distance δ unmatched intervals have length 2δ Bottleneck distance d B : infimum δ admitting a δ-matching 11 / 28
Persistence of functions Instead of point clouds, we can also study the persistence barcode of functions Example: surface mesh of the corpus callosum Surface is filtered by sublevel sets of the function 12 / 28
Persistence barcodes and persistence diagrams 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 13 / 28
Topological machine learning Task: shape retrieval Task: object recognition Task: texture recognition Topological data analysis 14 / 28
Topological machine learning Task: shape retrieval SVM PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning 14 / 28
Topological machine learning Task: shape retrieval SVM Kernel PCA Task: object recognition k-means Task: texture recognition Topological data analysis Machine learning 14 / 28
Linear machine learning methods Many machine learning methods solve linear geometric problems on vector spaces Principal component analysis (PCA): find orthogonal directions of largest variance 15 / 28
Linear machine learning methods Many machine learning methods solve linear geometric problems on vector spaces Principal component analysis (PCA): find orthogonal directions of largest variance Support vector machines (SVM): find best-separating hyperplane 15 / 28
Kernels and kernel methods Let X be a set, and H a vector space with inner product, H. Consider a map Φ X H 16 / 28
Kernels and kernel methods Let X be a set, and H a vector space with inner product, H. Consider a map Φ X H Then k(, ) = Φ( ), Φ( ) H X X R is a kernel (and Φ is the associated feature map) 16 / 28
Kernels and kernel methods Let X be a set, and H a vector space with inner product, H. Consider a map Φ X H Then k(, ) = Φ( ), Φ( ) H X X R is a kernel (and Φ is the associated feature map) For many linear methods: Feature map Φ not required explicitly 16 / 28
Kernels and kernel methods Let X be a set, and H a vector space with inner product, H. Consider a map Φ X H Then k(, ) = Φ( ), Φ( ) H X X R is a kernel (and Φ is the associated feature map) For many linear methods: Feature map Φ not required explicitly No coordinate basis for H required 16 / 28
Kernels and kernel methods Let X be a set, and H a vector space with inner product, H. Consider a map Φ X H Then k(, ) = Φ( ), Φ( ) H X X R is a kernel (and Φ is the associated feature map) For many linear methods: Feature map Φ not required explicitly No coordinate basis for H required Computations can be performed by evaluating k(, ) 16 / 28
Kernels and kernel methods Let X be a set, and H a vector space with inner product, H. Consider a map Φ X H Then k(, ) = Φ( ), Φ( ) H X X R is a kernel (and Φ is the associated feature map) For many linear methods: Feature map Φ not required explicitly No coordinate basis for H required Computations can be performed by evaluating k(, ) This is called the kernel trick. 16 / 28
A feature map for persistence diagrams 17 / 28
A feature map for persistence diagrams 17 / 28
A feature map for persistence diagrams 17 / 28
A feature map for persistence diagrams 17 / 28
Smoothing persistence diagrams Define domain Ω = {(b, d) b d}. 18 / 28
Smoothing persistence diagrams Define domain Ω = {(b, d) b d}. For a persistence diagram D, consider the solution u Ω [0, ) R, (x, t) u(x, t) of the heat equation t u = x u u = 0 in Ω [0, ), on Ω [0, ), u = δ y on Ω {0} y D 18 / 28
Smoothing persistence diagrams Define domain Ω = {(b, d) b d}. For a persistence diagram D, consider the solution u Ω [0, ) R, (x, t) u(x, t) of the heat equation t u = x u u = 0 in Ω [0, ), on Ω [0, ), u = δ y on Ω {0} y D For σ > 0, define feature map Φ σ (D) = u t=σ. 18 / 28
Smoothing persistence diagrams Define domain Ω = {(b, d) b d}. For a persistence diagram D, consider the solution u Ω [0, ) R, (x, t) u(x, t) of the heat equation t u = x u u = 0 in Ω [0, ), on Ω [0, ), u = δ y on Ω {0} y D For σ > 0, define feature map Φ σ (D) = u t=σ. This yields the persistence scale space (PSS) kernel k σ (D f, D g ) = Φ σ (D f ), Φ σ (D g ) L 2 (Ω). 18 / 28
Extending the domain Extend domain from Ω to R 2 19 / 28
Extending the domain Extend domain from Ω to R 2 Remove boundary condition, change initial condition: t u = x u in Ω [0, ), u = δ y δ y on R 2 {0}, y D where y = (b, a) is y = (a, b) reflected at the diagonal. 19 / 28
Extending the domain Extend domain from Ω to R 2 Remove boundary condition, change initial condition: t u = x u in Ω [0, ), u = δ y δ y on R 2 {0}, y D where y = (b, a) is y = (a, b) reflected at the diagonal. Restricting to Ω yields a solution for the original equation. 19 / 28
Smoothing persistence diagrams 20 / 28
Smoothing persistence diagrams 20 / 28
Smoothing persistence diagrams 20 / 28
Smoothing persistence diagrams 20 / 28
A closed-form solution Closed form solution (convolve with a Gaussian): u(x, t) = 1 4πt x y 2 x y 2 exp exp. y D 4t 4t 21 / 28
A closed-form solution Closed form solution (convolve with a Gaussian): u(x, t) = 1 4πt x y 2 x y 2 exp exp. y D 4t 4t Closed form expression for PSS kernel: k σ (D f, D g ) = 1 8πσ y z exp 2 y D f 8σ z D g exp y z 2. 8σ 21 / 28
Stability Let Φ σ (D f ) be the smoothing of a persistence diagram D f. Theorem For two persistence diagrams D f and D g and σ > 0 we have Φ σ (D f ) Φ σ (D g ) L 2 C σ d W 1 (D f, D g ). 22 / 28
Stability Let Φ σ (D f ) be the smoothing of a persistence diagram D f. Theorem For two persistence diagrams D f and D g and σ > 0 we have Φ σ (D f ) Φ σ (D g ) L 2 C σ d W 1 (D f, D g ). Here: stability with respect to d W1 (D f, D g ), where d Wp (D f, D g ) = ( inf µ D f D g x µ(x) p ) 1 p x D f 22 / 28
Stability Let Φ σ (D f ) be the smoothing of a persistence diagram D f. Theorem For two persistence diagrams D f and D g and σ > 0 we have Φ σ (D f ) Φ σ (D g ) L 2 C σ d W 1 (D f, D g ). Here: stability with respect to d W1 (D f, D g ), where d Wp (D f, D g ) = ( inf µ D f D g x µ(x) p ) 1 p x D f Note: bottleneck distance d B (D f, D g ) = lim p d Wp (D f, D g ) 22 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). death birth 23 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). death birth 23 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). λ k (x) death (a, b) x a b birth 23 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). death (a, b) λ k (x) a b x 23 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). death (a, b) λ k (x) λ 1 a b x 23 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). death (a, b) λ k (x) λ 2 λ 1 a b x 23 / 28
Persistence landscapes Introduced by Bubenik et at. (2013). death (a, b) λ k (x) λ 2 λ 1 a λ 3 b x 23 / 28
Persistence landscapes The persistence landscape of a persistence diagram D f can be interpreted as a function Φ L (D f ) in L p (R 2 ). λ(k, x) 0 x 1 2 k... 24 / 28
Persistence landscapes The persistence landscape of a persistence diagram D f can be interpreted as a function Φ L (D f ) in L p (R 2 ). λ(k, x) 0 x 1 2 k... Stablility: Φ L (D f ) Φ L (D g ) d B (D f, D g ) 24 / 28
Persistence landscapes The persistence landscape of a persistence diagram D f can be interpreted as a function Φ L (D f ) in L p (R 2 ). λ(k, x) 0 x 1 2 k... Stablility: Φ L (D f ) Φ L (D g ) d B (D f, D g ) But: no stability for Φ σ (D f ) Φ σ (D g ) L 2 24 / 28
Smoothed persistence diagrams vs. landscapes SHREC 2014 dataset of shapes: 300 surfaces, 20 classes (poses). Sublevel set filtration of heat kernel signature (HKS) on each surface Additional scale parameter t 2 Tasks: Retrieval: Get the nearest neighbour in L 2. Classification: Train a SVM with the kernel. 25 / 28
Experiments: Retrieval HKS t i d k L d kσ d k L d kσ t 1 53.3 88.7 +35.4 24.0 23.7 0.3 t 2 91.0 94.7 +3.7 20.5 25.7 +5.2 t 3 76.7 91.3 +14.6 16.0 18.5 +2.5 t 4 84.3 93.0 +8.7 26.8 33.0 +6.2 t 5 85.0 92.3 +7.3 28.0 38.7 +10.7 t 6 63.0 77.3 +14.3 28.7 36.8 +8.1 t 7 65.0 80.0 +15.0 43.5 52.7 +9.2 t 8 73.3 80.7 +7.4 70.0 58.2 11.8 t 9 73.0 83.0 +10.0 45.2 56.7 +11.5 t 10 51.3 69.3 +18.0 3.5 44.0 +40.5 Top 3 (2014) 99.3 92.3 91.0 68.5 59.8 58.3 Table: Nearest-neighbor retrieval performance. Left: SHREC 2014 (synthetic); Right: SHREC 2014 (real). 26 / 28
Experiments: Classification HKS t i k L k σ t 1 68.0 ± 3.2 94.7 ± 5.1 +26.7 t 2 88.3 ± 3.3 99.3 ± 0.9 +11.0 t 3 61.7 ± 3.1 96.3 ± 2.2 +34.7 t 4 81.0 ± 6.5 97.3 ± 1.9 +16.3 t 5 84.7 ± 1.8 96.3 ± 2.5 +11.7 t 6 70.0 ± 7.0 93.7 ± 3.2 +23.7 t 7 73.0 ± 9.5 88.0 ± 4.5 +15.0 t 8 81.0 ± 3.8 88.3 ± 6.0 +7.3 t 9 67.3 ± 7.4 88.0 ± 5.8 +20.7 t 10 55.3 ± 3.6 91.0 ± 4.0 +35.7 Table: Classification performance on SHREC 2014 (synthetic). 27 / 28
Experiments: Classification HKS t i k L k σ t 1 45.2 ± 5.8 48.8 ± 4.9 +3.5 t 2 31.0 ± 4.8 46.5 ± 5.3 +15.5 t 3 30.0 ± 7.3 37.8 ± 8.2 +7.8 t 4 41.2 ± 2.2 50.2 ± 5.4 +9.0 t 5 46.2 ± 5.8 62.5 ± 2.0 +16.2 t 6 33.2 ± 4.1 58.0 ± 4.0 +24.7 t 7 31.0 ± 5.7 62.7 ± 4.6 +31.7 t 8 51.7 ± 2.9 57.5 ± 4.2 +5.8 t 9 36.0 ± 5.3 41.2 ± 4.9 +5.2 t 10 2.8 ± 0.6 27.8 ± 5.8 +25.0 Table: Classification performance on SHREC 2014 (real). 27 / 28