Distributions of Persistence Diagrams and Approximations

Size: px

Start display at page:

Download "Distributions of Persistence Diagrams and Approximations"

Theodore Walters
5 years ago
Views:

Mathematics Department of Business Analytics

1 Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018

2 Thanks V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

3 Joint with: Funded by: Josh Mike (now at Michigan State) Andrew Marchese (now at Plated) John Sgouralis (now at Arizona State) Chris Oballe V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

4 Acoustic Signals at the ARL Two classes representing two different types of weapons. Goal is to help military officers make tactical decisions based on the type of weapon system. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

5 Signals from ARL Dataset V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

6 Merge statistics and topology to understand the geometry of signals and classify them. TDA has recently been introduced to the field of signal and time-series classification. Biological Signals (Zhang et al. (2015)) Action Recognition (Venkataraman et al. (2016)) Wheeze Detection (Emrani et al. (2014)) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

Nebenführ and VM, A Bayesian Topological Framework for the Identification and Reconstruction of Subcellular Motion, SIAM on Imaging Sciences,

7 Motivation Data has shape and shape matters. Latent topological features in scientific data. VM and A. Nebenführ, Tracking rapid intracellular movements: a Bayesian random set approach, Annals of Applied Statistics, I. Sgouralis, A. Nebenführ and VM, A Bayesian Topological Framework for the Identification and Reconstruction of Subcellular Motion, SIAM on Imaging Sciences, J. Mike, C. Sumrall, VM and F. Schwartz. Non-Landmark Classification in Paleobiology, Paleobiology, V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

8 Key Picture Data lies in a topological space. Take measurements, sampling that space. Reconstruct it by using an approximation. Compute the invariants to understand it. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

9 From Signals to Point Clouds (Taken s Thm) Suppose that w : [0, T] R are the realizations of a discrete time series in [0, T]. Consider a set of delay indices τ 1, τ 2,, τ n 1 The n dimensional delay embedding of W is the concatenation of time-delayed samples: W(t) = (w(t), w(t + τ 1 ), w(t + τ 2 ),, w(t + τ n 1 )) (1) 4.4 Point Cloud w(t) w(t+τ 2 ) Time (t) w(t+τ 1 ) w(t) Figure: Signal evolution in time domain Figure: 3D delay embedding V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

10 From Point Clouds to Inference and Classification Now we have turned our time-series into a point cloud living in N-dimensional space. Point Cloud 4.5 w(t+τ 2 ) w(t+τ 1 ) w(t) How can we extract information from this data and use it for classification and statistical inference? V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

11 Outline Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

12 Simplicial Complex Simplicial complexes are discretizations of real-life shapes Generalization of graphs with higher order relationships among the nodes. A simplicial complex is the union of simple pieces (simplices) i.e. vertices, edges, triangles etc. A face of k simplex are all the (k 1) simplex. Two simplices must intersect at a common face or not at all. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

13 Construction of Simplicial complexes for data Start with a point-cloud Π and create an abstract representation of vertices one for each point in your Π. Figure: Left: Point Cloud; Right: Simplicial Complex V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

14 Construction of Simplicial complexes for data Create spheres" of radius r centered at each point. Figure: Left: Point Cloud; Right: Simplicial Complex V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

15 Construction of Simplicial complexes for data Increase radius r. Figure: Left: Point Cloud; Right: Simplicial Complex V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

16 Construction of Simplicial complexes for data Add edges between vertices v i and v j if the corresponding circles intersect. Figure: Left: Point Cloud; Right: Simplicial Complex V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

17 Construction of Simplicial complexes for data Add triangles between vertices v i, v j and v k if all three circles intersect, etc. Figure: Left: Point Cloud; Right: Simplicial Complex V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

18 Based on the simplicial complex we retrieve the Betti numbers (the dimensions of some vector spaces associated to our topological space) Betti 0: number of clusters" Betti 1: number of holes Figure: β 0 = 2, β 1 = 0 V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

19 Based on the simplicial complex we retrieve the Betti numbers (the dimensions of some vector spaces associated to our topological space) Betti 0: number of clusters" Betti 1: number of holes Figure: β 0 = 2, β 1 = 0 Figure: β 0 = 2, β 1 = 1 V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

20 Persistence Diagrams Interested in is the persistence of the Betti numbers. When do different connected components/holes form and how long do they last (with respect to r)? The Betti numbers compactly encoded in a 2-dim plot which provides the birth time vs death time of these features V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

21 Results on Signals from ARL Dataset (a) (b) (c) (d) (e) (f) A. Marchese and VM. Signal classification with a point process distance on the space of persistence diagrams. Advances in Data Analysis and Classification, pp. 1-26, 2017 V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

22 Classifier V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

23 Statistics and Persistence Diagrams Summary statistics such as center and variance (Bobrowski et al., 2014; Mileyko et al., 2011; Turner et al., 2014; Marchese and VM, 2017) Birth and death estimates (Emmett et al., 2014) Confidence sets (Fasy et al., 2014) Need a framework to understand the above summary statistics through a single viewpoint V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

24 Novel framework A complete and consistent framework of how to construct distributions of persistence diagrams. Capture the important information of these diagrams in terms of their inherent set properties Set membership Cardinality V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

25 Setup Take data X = x j generated by some random process Associated (random) persistence diagram D with features ξ i = (b i, d i ), such that the a hole" appears at scale b i and is filled at scale d i V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

26 Lemma 2.1 (J. Mike & VM, 2018) Consider a multiset of independent singleton random persistence diagrams { D j} M j=1. If each singleton Dj is described by the value q (j) = P[D j ] and the subsequent conditional pdf, p (j) (ξ), given D j = 1, then the global pdf for D = M j=1 D j is given by f D (ξ 1,..., ξ N ) = γ I(N,M) Q(γ) N p (γ(k)) (ξ k ), (2) for each N {0,..., M} where I(N, M) consists of all increasing injections γ : {1,..., N} {1,..., M}, and The sum over γ I(N, M) in Eq. (2) accounts for each possible combination of singleton presence. The weights Q(γ) proportional to the probability for each singleton to be either present, q(j), or absent, 1 q(j), for each j. k=1 J. Mike and VM. Nonparametric Estimation of Probability Density Functions of Random Persistence Diagrams. arxiv: V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

27 Example 2.2 Consider two 1-dimensional singleton diagrams, D 1 and D 2, with probabilities of being nonempty q (1) = 0.6 and q (2) = 0.8. Local densities when nonempty: p (1) (x) = 1 2π e (x+1)2 /2 and p (2) (x) = 1 2π e (x 1)2 /2. Global pdf for D = D 1 D 2 through a set of local densities { f0, f 1 (x), f 2 (x, y) } such that f 0 = P[ D = 0] = (1 q (1) )(1 q (2) ) = 0.08 f 1 (x) = (1 q (2) )q (1) p (1) (x) + (1 q (1) )q (2) p (2) (x) = 0.12 e (x+1)2 / e (x 1)2 /2, 2π 2π f 2 (x, y) = q(1) q (2) 2 = π [ ] p (1) (x)p (2) (y) + p (1) (y)p (2) (x) ( e ((x 1)2 +(y+1) 2 )/2 + e ((x+1)2 +(y 1) 2 )/2 ). (3a) (3b) (3c) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

These pdfs cover the different possible input dimensions and are symmetric

28 Figure: Left: Plot of the local density f 1(x) in Eq. (3b); Right: Contour plot of the local density f 2(x, y) in Eq. (3c). These pdfs cover the different possible input dimensions and are symmetric under permutations of the input. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

29 Past Studies-KDE Extensive work to devise various maps from persistence diagrams into Hilbert spaces Mapping into a Hilbert space, these studies allow the application of statistical learning methods such as principal component analysis, random forest, support vector machine, etc. Chepushtanova et al. (2015) discretizes persistence diagrams via bins, yielding vectors in a high dimensional Euclidean space. Reininghaus et al., (2014) and Kusano et al., (2016) define kernels between persistence diagrams in a Reproducing Kernel Hilbert Spaces Adler et al. (2017) utilizes Gibbs distributions in order to replicate similar persistence diagrams, e.g. for use in MCMC type sampling. Kernel density estimation on the underlying data to estimate a target diagram Bobrowski et al. (2014) constructs an estimator for the target diagram Fasy et al. (2014) defines a confidence set. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

30 Building a Kernel Density Goal: kernel density K σ (Z, D). Center Diagram D Bandwidth σ Input Z. Split D into upper and lower halves: D u = { (b, d) D : d b > σ } D l = { (b, d) D : d b σ } Define random diagrams: D u centered at D u. D l centered at D l. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

31 Building the Upper Density. Split into singletons: D u = j D j,u Each D j,u is described by: q j = P[D j,u ]. Local pdf, p j (b, d), Restricted Gaussian V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

32 Building the Lower Density. p l (b, d) = 1 N l (b i,d i) D l 1 πσ 2 e Lower cardinality N l = D l. Cardinality probability mass: ν(j). Chosen with mean N l. Single kernel density p l : Project D l onto the diagonal. Kernel estimate for these points. D l has number according to ν, with draws i.i.d. according to p l. ( ( ) b b i +d i 2+ ( 2 Global pdf D l : f D l(ξ 1,..., ξ N ) = ν(n) d b i +d i 2 ) 2 ) /2σ 2. (4) N p l (ξ j ). (5) j=1 V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

33 The Kernel Density Theorem 3.1 (J. Mike & VM, 2018) The random diagram D = D u D l with D u and D l defined according to the previous construction with center D and bandwidth σ has the following global pdf, or kernel density, evaluated at Z = {ξ 1,..., ξ N }: N u K σ (Z, D) = ν(n j) j=0 γ I(j,N) Q(γ) j p γ(k) (ξ k ) k=1 N k=j+1 where I(j, N) = { γ : {1,..., j} {1,..., N} : γ is increasing }, and Q(γ) = p l (ξ k ) (6) N ( ) j=1 1 q j j ( ) q γ(k). (7) 1 q γ(k) j k=1 k=1 V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

34 Example 3.2 Consider D = ((1, 3), (2, 4), (1, 1.3), (3, 3.2)) and σ = 1/2. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

35 Kernel with input cardinality 1 Kernel K σ ((b 1, d 1 ), D) equals to: [ ν(0) (1 q (2) )q (1) p (1) (b 1, d 1 ) + (1 q (1) )q (2) p (2) ] (b 1, d 1 ) [ + ν(1) (1 q (1) )(1 q (2) )p l ] (b 1, d 1 ) (8) Figure: Contour map for the kernel density restricted to a single input feature (Eq. (8)). The center diagram is indicated by red (upper) and green (lower) points. Scale bars at the right of each plot indicate the range of probability density in each shaded region p l (b, d) = 2 [e ((b 1.15)2 +(d 1.15) 2 ) + e ((b 3.1) 2 +(d 3.1) 2 ) ] π p (1) (b 1, d 1 ) e 2((b1 2)2 +(d 1 4) 2 ) p (2) (b 1, d 1 ) e 2((b1 1)2 +(d 1 3) 2 ) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

36 Kernel with input cardinality 2 Consider Z = (ξ 1, ξ 2 ) = ((b 1, d 1 ), (b 2, d 2 )) K σ ((ξ 1, ξ 2 ), D) = ν(0)q (1) q (2) p (1) (b 1, d 1 )p (2) (b 2, d 2 ) [ + ν(1) (1 q (2) )q (1) p (1) (b 1, d 1 ) ] + (1 q (1) )q (2) p (2) (b 1, d 1 ) p l (b 2, d 2 ) (9) + ν(2)(1 q (1) )(1 q (2) )p l (b 1, d 1 )p l (b 2, d 2 ) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

37 Kernel with input cardinality 2 (a) (b) (c) Figure: Contour maps for slices of the kernel density K σ((ξ, ξ 2), D) with input cardinality 2. A single feature ξ 2, indicated by white crosshairs, is fixed to restrict to a 2D subspace as follows: (a) ξ 2 = (1, 3) (b) ξ 2 = (2, 4) and (c) ξ 2 = (2.5, 2.7). The center diagram is indicated by red (upper) and green (lower) points. Scale bars at the right of each plot indicate the range of probability density in each shaded region. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

38 Kernel with input cardinality 3 [ ] K σ ((ξ 1, ξ 2, ξ 3 ), D) = ν(1) q (1) q (2) p (1) (b 1, d 1 )p (2) (b 2, d 2 ) p l (b 3, d 3 ) [ + ν(2) (1 q (2) )q (1) p (1) (b 1, d 1 ) ] + ν(2)(1 q (1) )q (2) p (2) (b 1, d 1 ) p l (b 2, d 2 )p l (b 3, d 3 ) + ν(3)(1 q (1) )(1 q (2) )p l (b 1, d 1 )p l (b 2, d 2 )p l (b 3, d 3 ). (10) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

39 Kernel with input cardinality 3 (a) Figure: Contour maps for slices of the kernel density K σ((ξ, ξ 2, ξ 3), D) with input cardinality 3. A pair of features ξ 2 and ξ 3, indicated by white crosshairs, are fixed to restrict to a 2D subspace as follows: (a) (ξ 2, ξ 3) = ((1, 3), (2, 4)) and (b) (ξ 2, ξ 3) = ((1, 3), (2.5, 3.5)). The center diagram is indicated by red (upper) and green (lower) points. Scale bars at the right of each plot indicate the range of probability density in each shaded region. (b) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

40 Kernel Density Estimation Theorem 3.3 (J. Mike & VM, 2018) Random persistence diagram with global pdf f : f must satisfy decay and boundedness conditions. Diagrams {Di} n i=1 sampled i.i.d. according to f. Yield a KDE: ˆf (Z) = 1 n n i=1 K σ(z, D i ) σ = O(n α ) chosen with 0 < α α 2M, ˆf f uniformly on compact subsets of the space of PDs. J. Mike and VM. Nonparametric Estimation of Probability Density Functions of Random Persistence Diagrams. arxiv: V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

41 Example 3.4 Generate samples which each consist of 10 points uniformly from the unit circle with additive Gaussian noise, N((0, 0), ( 1 50 )2 I 2 ). Toy dataset for signal analysis. (a) (b) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

42 Plots of persistence diagram KDEs. Color indicates the probability density. White regions above the diagonal indicate portions of very low probability density. Each column is a particular slice, while each row is a particular KDE with n and σ. Left: Local KDEs ˆf n,σ((b, d)) evaluated at a diagram with only one feature. The mode of the converged density is approximately (b 2, d 2) = (0.77, 0.98). Right: Local KDEs ˆf n,σ((b, d), (0.77, 0.98)) evaluated at a diagram with two features and one feature fixed.these slices have two modes which are very close to the diagonal at (0, 0) and (1, 1). V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

43 Summary Considered the problem of estimating the distribution of persistence diagrams Established a novel kernel density We focused on set properties-membership and cardinality We established convergence and verified several synthetic examples. With a pdf at hand, we can start implementing Monte Carlo sampling and move on to further probabilistic settings Bayesian formulation Applications in biology, defense, material science and chemistry. V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

44 Bayesian Framework In principle, we can compute posterior distributions using Bayes theorem for random sets π(d X D Y ) l(d Y D X )π(d X ) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

45 Posterior Approximation Mahler (2003) Singh, Vo, Baddeley and Zuyev (2007) Caron, Del Moral, Doucet, Pace (2011) Prior Prior Posterior Persistence Intensity Persistence Intensity Persistence Intensity Birth (a) Birth (b) Birth (c) V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

46 Thank you-questions? V. Maroulas (UTK) KDE of random persistence diagrams August 31, / 45

Neuronal structure detection using Persistent Homology

Neuronal structure detection using Persistent Homology J. Heras, G. Mata and J. Rubio Department of Mathematics and Computer Science, University of La Rioja Seminario de Informática Mirian Andrés March