Geometric Inference for Probability distributions

Geometric Inference for Probability distributions F. Chazal 1 D. Cohen-Steiner 2 Q. Mérigot 2 1 Geometrica, INRIA Saclay, 2 Geometrica, INRIA Sophia-Antipolis 2009 June 1

Motivation What is the (relevant) topology/geometry of a point cloud data set in R d? Motivations : Reconstruction, manifold learning and NLDR, clustering and segmentation, etc...

Geometric inference Question Given an approximation C of a geometric object K, is it possible to estimate the topological and geometric properties of K, knowing only the approximation C? The answer depends on the considered class of objects and a notion of distance between the objects (approximation) ; Positive answer for a large class of compact sets endowed with the Hausdorff distance (i.e. K and C are close if all the points of C (resp. K) are close to K (resp. C)). In this talk : - can the considered objects be probability measures on R d? - motivation : allowing approximations to have outliers or to be corrupted by non local noise.

Distance functions for geometric inference Distance function and Hausforff distance Distance to a compact K R d : d K : x inf p K x p Hausdorff distance between compact sets K, K R d : d H (K, K ) = inf x R d d K (x) d K (x) Replace K and C by d K and d C. Compare the topology of the offsets K r = d 1 K ([0, r]) and C r = d 1 C ([0, r]).

Stability properties of the offsets Topological/geometric properties of the offsets of K are stable with respect to Hausdorff approximation : 1. Topological stability of the offsets of K (CCSL 06, NSW 06). 2. Approximate normal cones (CCSL 08). 3. Boundary measures (CCSM 07), curvature measures (CCSLT 09), Voronoi covariance measures (GMO 09).

The problem of outliers If K = K {x} where d K (x) > R, then d K d K > R : offset-based inference methods fail! Question : Can we generalized the previous approach by replacing the distance function by a distance-like function having a better behavior with respect to noise and outliers?

The three main ingredients for stability The proofs of stability relies on three important theoretical facts : 1 the stability of the map K d K : d K d K = sup x R d d K (x) d K (x) = d H (K, K ) 2 the 1-Lipschitz property for d K ; 3 the 1-concavity on the function d 2 K : x x 2 d 2 K (x) is convex.

Replacing compact sets by measures A measure µ is a mass distribution on R d. Mathematically, it is defined as a map µ that takes a (Borel) subset B R d and outputs a nonnegative number µ(b). Moreover we ask that if (B i ) are disjoint subsets, µ ( i N B ) i = i N µ(b i) µ(b) corresponds to to the mass of µ contained in B a point cloud C = {p 1,..., p n } defines a measure µ C = 1 n i δ p i the volume form on a k-dimensional submanifold M of R d defines a measure vol k M.

Distance between measures The Wasserstein distance d W (µ, ν) between two probability measures µ, ν quantifies the optimal cost of pushing µ onto ν, the cost of moving a small mass dx from x to y being x y 2 dx. c 1 d 1 c i π i,j d j µ ν π 1 µ and ν are discrete measures : µ = i c iδ xi, ν = j d jδ yj with j d j = i c i. 2 Transport plan : set of coefficients π ij 0 with i π ij = d j and j π ij = c i. 3 Cost of a transport plan ( ) 1/2 C(π) = ij x i y j 2 π ij 4 d W (µ, ν) := inf π C(π)

Wasserstein distance Examples : If C 1 and C 2 are two point clouds, with #C 1 = #C 2, then d W (µ C1, µ C2 ) is the square root of the cost of a minimal least-square matching between C 1 and C 2. If C = {p 1,..., p n } is a point cloud, and C = {p 1,..., p n k 1, o 1,..., o k } with d(o i, C) = R, then d H (C, C ) R but d W (µ C, µ C ) k (R + diam(c)) n

The distance to a measure Distance function to a measure, first attempt Let m ]0, 1[ be a positive mass, and µ a probability measure on R d : δ µ,m (x) = inf {r > 0; µ(b(x, r)) > m}. x δ µ,m(x) µ(b(x, δ µ,m(x)) = m µ δ µ,m is the smallest distance needed to attain a mass of at least m ; Coincides with the distance to the k-th neighbor when m = k/n and µ = 1 n n i=1 δ p i : δ µ,k/n (µ) = x p k C (x).

Unstability of µ δ µ,m Distance to a measure, first attempt Let m ]0, 1[ be a positive mass, and µ a probability measure on R d : δ µ,m (x) = inf {r > 0; µ(b(x, r)) > m}. Unstability under Wasserstein perturbations : µ ε = (1/2 ε)δ 0 + (1/2 + ε)δ 1 for ε > 0 : x < 0, δ µε,1/2(x) = x 1 for ε = 0 : x < 0, δ µ0,1/2(x) = x 0 Consequence : the map µ δ µ,m C 0 (R d ) is discontinuous whatever the (reasonable) topology on C 0 (R d ).

The distance function to a measure. Definition If µ is a measure on R d and m 0 > 0, one let : ( 1 d µ,m0 : x R d m 0 m0 0 ) 1/2 δµ,m(x)dm 2 Example. Let C = {p 1,..., p n } and µ = 1 n n i=1 δ p i. Let p k C (x) denote the kth nearest neighbor to x in C, and set m 0 = k 0 /n : ( 1 d µ,m0 (x) = k 0 k 0 k=1 x p k C (x) 2 ) 1/2

The distance function to a discrete measure. Example (continued) Let C = {p 1,..., p n } and µ = 1 n n i=1 δ p i. Let p k C (x) denote the kth nearest neighbor to x in C, and set m 0 = k 0 /n : d µ,m0 (x) = k0 x p k C (x) 2) 1/2. ( 1 k0 k=1 x 2 d 2 µ,m 0 (x) = 1 k 0 2x p k p C k (x) k C (x) 0 k=1 Hence, this function is linear inside k 0 th-order Voronoï cells, and : ( x 2 d 2 µ,m 0 (x)) = 1 k 0 p k C k (x) 0 k=1 2

1-Concavity of the squared distance function Example. (continued) d 2 µ,m 0 (y) = 1 k 0 y p k C k (y) 2 1 0 k=1 k 0 k 0 k=1 1 k 0 y x + x p k C k (x) 2 0 k=1 y p k C (x) 2 = y x 2 + 1 k 0 x p k C k (x) 2 1 + 2 0 k=1 k 0 k 0 k=1 = y x 2 + d 2 µ,m 0 (x) + y x d 2 µ,m 0 (x) y x x p k C (x)

Proposition The distance function to a discrete measure is 1-concave, ie. ( y 2 d 2 µ,m 0 (y)) ( x 2 d 2 µ,m 0 (x)) ( x 2 d 2 µ,m 0 )(x) x y This is equivalent to the function x x 2 d 2 µ,m 0 (x) being convex.

Another expression for d µ,m0 Projection submeasure For any x R d, and m > 0, denote by µ x,m the restriction of µ on the ball B = B(x, δ µ,m (x)), whose trace on the sphere B has been rescaled so that the total mass of µ x,m is m. The measure µ x,m gives mass to the multiple projections of x on µ ; For the point cloud case, when m 0 = k 0 /n + r with 0 < r < 1, and x is not on a k-voronoï face, µ x,m0 = k 0 k=1 1 n δ p k C (x) + rδ p k 0 +1 C (x)

Another expression for d µ,m0 Proposition Let µ be a probability measure on R d and m 0 > 0. Then, for almost every point x R d d µ,m0 is differentiable at x and : d 2 µ,m 0 (x) = 1 x h 2 dµ x,m0 (h) m 0 ( x 2 d 2 µ,m 0 (x)) = 2 hdµ x,m0 (h) m 0 for the point cloud case : ( x 2 d 2 µ,m 0 (x)) = 2 k 0 k0 k=1 pk C (x) interpretation : d µ,m0 (x) is the (partial) Wasserstein distance between the Dirac mass m 0 δ x and µ.

Theorem 1 the function x d µ,m0 (x) is 1-Lipschitz ; 2 the function x x 2 d 2 µ,m 0 (x) is convex ; 3 the map µ d µ,m0 from probability measures to continuous 1 functions is m0 Lipschitz, ie d µ,m0 d µ,m 0 1 m0 d W (µ, µ ) the proof of 2 and 3 involves the second expression for d µ,m0. properties 1 and 2 are related to the regularity of d µ,m0 stability property. while 3 is a

Consequences of the previous properties 1 existence of an analogous to the medial axis 2 stability of a filtered version of it (as with the µ-medial axis) under Wasserstein perturbation 3 stability of the critical function of a measure 4 the gradient d µ,m0 is L 1 -stable 5... = the distance functions d µ,m0 share many stability and regularity properties with the usual distance function.

Example : square with outliers 10% outliers, k = 150 δ µ,m0, m 0 = 1/10

Example : square with outliers d µ,m0 d µ,m0

A 3D example Reconstruction of an offset from a noisy dataset, with 10% outliers

A reconstruction theorem Theorem Let µ be a probability measure of dimension at most k > 0 with compact support K R d such that r α (K) > 0 for some α (0, 1]. For any 0 < η < r α (K), there exists positive constants m 1 = m 1 (µ, α, η) > 0 and C = C(m 1 ) > 0 such that : for any m 0 < m 1 and any probability measure µ such that W 2 (µ, µ ) < C m 0, the sublevel set d 1 µ,m 0 ((, η]) is homotopy equivalent (and even isotopic) to the offsets d 1 K ([0, r]) of K for 0 < r < r α (K).

k-nn density estimation vs distance to a measure data Density is estimated using x m 0 ω d 1 (δ µ,m0 (x)), m 0 = 150/1200.

k-nn density estimation vs distance to a measure density log(1 + 1/d µ,m0 ) Density is estimated using x (Devroye-Wagner 77). m 0 vol d (B(x,δ µ,m0 (x))), m 0 = 150/1200

k-nn density estimation vs distance to a measure 1 the gradient of the estimated density can behave wildly 2 exhibits peaks near very dense zone 1. can be fixed using d µ,m0 (because of the semiconcavity) 2. shows that the distance function is a better-behaved geometric object to associate to a measure.

Pushing data along the gradient of d µ,m0 Mean-Shift like algorithm (Comaniciu-Meer 02) Theoretical guarantees on the convergence of the algorithm and smoothness of trajectories.

Pushing data along the gradient of d µ,m0

Take-home messages µ d µ,m0 provide a way to associate geometry to a measure in Euclidean space. d µ,m0 is robust to Wasserstein perturbations : outliers and noise are easily handled (no assumption on the nature of the noise). d µ,m0 shares regularity properties with the usual distance function to a compact. Geometric stability results in this measure-theoretic setting : topology/geometry of the sublevel sets of d µ,m0, stable notion of persistence diagram for µ,... Algorithm : for finite point clouds d µ,m0 and (d µ,m0 ) can be easily and efficiently computed in any dimension. To get more details : http ://geometrica.saclay.inria.fr/team/fred.chazal/papers/rr-6930.pdf