Proximal point algorithm in Hadamard spaces Miroslav Bacak Télécom ParisTech Optimisation Géométrique sur les Variétés - Paris, 21 novembre 2014
Contents of the talk 1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics
Proximal point algorithm in Hadamard spaces Why? Well... it is used in: Phylogenetics: computing medians and means of phylogenetic trees. diffusion tensor imaging: the space P (n, R) of symmetric positive definite matrices n n with real entries is a Hadamard space if it is equipped with the Riemannian metric X, Y A := Tr ( A 1 XA 1 Y ), X, Y T A (P (n, R)), for every A P (n, R). Computational biology: shape analyses of tree-like structures:
Tree-like structures in organisms Figure: Bronchial tubes in lungs Figure: Transport system in plants Figure: Human circulatory system
Definition of Hadamard space Let (H, d) be a complete metric space where: 1 any two points x 0 and x 1 are connected by a geodesic x: [0, 1] H: t x t, 2 and, d (y, x t ) 2 (1 t)d (y, x 0 ) 2 + td (y, x 1 ) 2 t(1 t)d (x 0, x 1 ) 2, for every y H. Then (H, d) is called a Hadamard space. For today: assume that local compactness.
Geodesic space x 2 x 1
Geodesic space x 2 x 1
Geodesic space (1 λ)x 1 + λx 2 x 2 x 1
Definition of nonpositive curvature A geodesic triangle in a geodesic space: x 3 x 2 x 1
Terminology remark CAT(κ) spaces, for κ R, were introduced in 1987 by Michail Gromov C = Cartan A = Alexandrov T = Toponogov We are particularly interested in CAT(0) spaces.
Examples of Hadamard spaces 1 Hilbert spaces, the Hilbert ball 2 complete simply connected Riemannian manifolds with Sec 0 3 R-trees: a metric space T is an R-tree if for x, y T there is a unique geodesic [x, y] if [x, y] [y, z] = {y}, then [x, z] = [x, y] [y, z] 4 Euclidean buildings 5 the BHV tree space (space of phylogenetic trees) 6 L 2 (M, H), where (M, µ) is a probability space: ( d 2 (u, v) := M ) 1 d (u(x), v(x)) 2 2 dµ(x), u, v L 2 (M, H)
Convexity in Hadamard spaces Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity: Definition A set C H is convex if, given x, y C, we have [x, y] C. Definition A function f : H (, ] is convex if f γ is a convex function for each geodesic γ : [0, 1] H.
Convexity in Hadamard spaces Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity: Definition A set C H is convex if, given x, y C, we have [x, y] C. Definition A function f : H (, ] is convex if f γ is a convex function for each geodesic γ : [0, 1] H.
Convexity in Hadamard spaces Let (H, d) be a Hadamard space. These spaces allow for a natural definition of convexity: Definition A set C H is convex if, given x, y C, we have [x, y] C. Definition A function f : H (, ] is convex if f γ is a convex function for each geodesic γ : [0, 1] H.
Examples of convex functions 1 The indicator function of a convex closed set C H : ι C (x) := 0, if x C, and ι C (x) :=, if x / C. 2 The distance function to a closed convex subset C H : d C (x) := inf d(x, c), x H. c C 3 The displacement function of an isometry T : H H : δ T (x) := d(x, T x), x H.
Examples of convex functions 1 The indicator function of a convex closed set C H : ι C (x) := 0, if x C, and ι C (x) :=, if x / C. 2 The distance function to a closed convex subset C H : d C (x) := inf d(x, c), x H. c C 3 The displacement function of an isometry T : H H : δ T (x) := d(x, T x), x H.
Examples of convex functions 1 The indicator function of a convex closed set C H : ι C (x) := 0, if x C, and ι C (x) :=, if x / C. 2 The distance function to a closed convex subset C H : d C (x) := inf d(x, c), x H. c C 3 The displacement function of an isometry T : H H : δ T (x) := d(x, T x), x H.
Examples of convex functions 4 Let c: [0, ) H be a geodesic ray. The function b c : H R defined by b c (x) := lim t [d (x, c(t)) t], x H, is called the Busemann function associated to the ray c. 5 The energy of a mapping u: M H given by E(u) := d (u(x), u(y)) 2 p(x, dy)dµ(x), M M where (M, µ) is a measure space with a Markov kernel p(x, dy). E is convex continuous on L 2 (M, H).
Examples of convex functions 4 Let c: [0, ) H be a geodesic ray. The function b c : H R defined by b c (x) := lim t [d (x, c(t)) t], x H, is called the Busemann function associated to the ray c. 5 The energy of a mapping u: M H given by E(u) := d (u(x), u(y)) 2 p(x, dy)dµ(x), M M where (M, µ) is a measure space with a Markov kernel p(x, dy). E is convex continuous on L 2 (M, H).
Examples of convex functions 6 Given a 1,..., a N H and w 1,..., w N > 0, set f(x) := N w n d (x, a n ) p, x H, n=1 where p [1, ). If p = 1, we get Fermat-Weber problem for optimal facility location. A minimizer of f is called a median. If p = 2, then a minimizer of f is the barycenter of µ := or the mean of a 1,..., a N. N w n δ an, n=1
Examples of convex functions 6 Given a 1,..., a N H and w 1,..., w N > 0, set f(x) := N w n d (x, a n ) p, x H, n=1 where p [1, ). If p = 1, we get Fermat-Weber problem for optimal facility location. A minimizer of f is called a median. If p = 2, then a minimizer of f is the barycenter of µ := or the mean of a 1,..., a N. N w n δ an, n=1
Strongly convex functions A function f : H (, ] is strongly convex with parameter β > 0 if f ((1 t)x + ty) (1 t)f(x) + tf(y) βt(1 t)d(x, y) 2, for any x, y H and t [0, 1]. Each strongly has a unique minimizer. Example Given y H, the function f := d(y, ) 2 is strongly convex. Indeed, d (y, x t ) 2 (1 t)d (y, x 0 ) 2 + td (y, x 1 ) 2 t(1 t)d (x 0, x 1 ) 2, for each geodesic x: [0, 1] H.
1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics
Proximal point algorithm Let f : H (, ] be convex lsc. Optimization problem: min f(x). x H Recall: no (sub)differential, no shooting (singularities). Implicit methods are appropriate. The PPA generates a sequence [ x i := J λi (x i 1 ) := arg min f(y) + 1 ] d(y, x i 1 ) 2, y H 2λ i where x 0 H is a given starting point and λ i > 0, for each i N.
Proximal point algorithm Let f : H (, ] be convex lsc. Optimization problem: min f(x). x H Recall: no (sub)differential, no shooting (singularities). Implicit methods are appropriate. The PPA generates a sequence [ x i := J λi (x i 1 ) := arg min f(y) + 1 ] d(y, x i 1 ) 2, y H 2λ i where x 0 H is a given starting point and λ i > 0, for each i N.
Convergence of proximal point algorithm Theorem (M.B., 2011) Let f : H (, ] be a convex lsc function attaining its minimum. Given x 0 H and (λ i ) such that 1 λ i =, the PPA sequence (x i ) converges to a minimizer of f. (Resolvents are firmly nonexpansive - cheap version for λ i = λ.) Disadvantage: The resolvents x i := J λi (x i 1 ) := arg min y H are often difficult to compute. [ f(y) + 1 ] d(y, x i 1 ) 2, 2λ i
Convergence of proximal point algorithm Theorem (M.B., 2011) Let f : H (, ] be a convex lsc function attaining its minimum. Given x 0 H and (λ i ) such that 1 λ i =, the PPA sequence (x i ) converges to a minimizer of f. (Resolvents are firmly nonexpansive - cheap version for λ i = λ.) Disadvantage: The resolvents x i := J λi (x i 1 ) := arg min y H are often difficult to compute. [ f(y) + 1 ] d(y, x i 1 ) 2, 2λ i
Splitting proximal point algorithm Let f 1,..., f N be convex lsc and consider N f(x) := f n (x), x H. n=1 Example (Median and mean) f n := d (, a n ), f n := d (, a n ) 2. Key idea: apply resolvents J n λ s of f n s in a cyclic or random order.
Splitting proximal point algorithm Let f 1,..., f N be convex lsc and consider N f(x) := f n (x), x H. n=1 Example (Median and mean) f n := d (, a n ), f n := d (, a n ) 2. Key idea: apply resolvents J n λ s of f n s in a cyclic or random order.
Splitting proximal point algorithm Let f 1,..., f N be convex lsc and consider N f(x) := f n (x), x H. n=1 Example (Median and mean) f n := d (, a n ), f n := d (, a n ) 2. Key idea: apply resolvents J n λ s of f n s in a cyclic or random order.
Splitting proximal point algorithm Let x 0 H be a starting point. For each k N 0 we apply resolvents in cyclic order: x kn+1 := J 1 λ k (x kn ), x kn+2 := J 2 λ k (x kn+1 ),. x kn+n := J N λ k (x kn+n 1 ), or in random order: x i+1 := J r i λ i (x i ), where (r i ) are random variables with values in {1,..., N}.
Convergence of splitting proximal point algorithm Theorem (Cyclic order version + Random order version) Assume that f n are Lipschitz (or locally Lipschitz and the minimizing sequence is bounded). Then 1 the cyclic PPA sequence converges to a minimizer of f 2 the random PPA sequence converges to a minimizer of f almost surely. Assumptions are satisfied for f(x) := where p [1, ). N w n d (x, a n ) p, x H, n=1
Convergence of splitting proximal point algorithm Theorem (Cyclic order version + Random order version) Assume that f n are Lipschitz (or locally Lipschitz and the minimizing sequence is bounded). Then 1 the cyclic PPA sequence converges to a minimizer of f 2 the random PPA sequence converges to a minimizer of f almost surely. Assumptions are satisfied for f(x) := where p [1, ). N w n d (x, a n ) p, x H, n=1
Splitting proximal point algorithm (for mean) Hence instead of computing (the usual PPA) x i+1 := arg min z H [ N ] d (z, a n ) 2 + 1 d (z, x i ) 2, 2λ i n=1 we are to minimize the function x i+1 := arg min z H [ d (z, a n ) 2 + 1 2λ i d (z, x i ) 2 where a n are chosen in a cyclic or random order. This is one-dimensional problem! = x i+1 is a convex combination of a n and x i. ],
Splitting proximal point algorithm (for mean) Hence instead of computing (the usual PPA) x i+1 := arg min z H [ N ] d (z, a n ) 2 + 1 d (z, x i ) 2, 2λ i n=1 we are to minimize the function x i+1 := arg min z H [ d (z, a n ) 2 + 1 2λ i d (z, x i ) 2 where a n are chosen in a cyclic or random order. This is one-dimensional problem! = x i+1 is a convex combination of a n and x i. ],
1 Basic facts on Hadamard spaces 2 Proximal point algorithm 3 Applications to computational phylogenetics
Left: One of many evolutionary trees Right: A picture of an evolutionary tree by Charles Darwin (1837)
Billera-Holmes-Vogtmann tree space T d 0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 1 2 3 4 Figure: 5 out of 15 orthants of T 4
Billera-Holmes-Vogtmann tree space T d Figure: A finite set of trees in T 4
Billera-Holmes-Vogtmann tree space T d Figure: Consider the most frequent tree topology only
Computing the mean: Random order version Algorithm (SPPA with f n := d(, T n ) 2 ) Input: T 1,..., T N T d Step 1: S 1 := T 1 and i := 1 Step 2: choose r {1,..., N} at random Step 3: S i+1 := 1 i+1 T r + i i+1 S i Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)
Computing the mean: Random order version Algorithm (SPPA with f n := d(, T n ) 2 ) Input: T 1,..., T N T d Step 1: S 1 := T 1 and i := 1 Step 2: choose r {1,..., N} at random Step 3: S i+1 := 1 i+1 T r + i i+1 S i Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)
Computing the mean: Random order version Algorithm (SPPA with f n := d(, T n ) 2 ) Input: T 1,..., T N T d Step 1: S 1 := T 1 and i := 1 Step 2: choose r {1,..., N} at random Step 3: S i+1 := 1 i+1 T r + i i+1 S i Step 4: i := i + 1 Step 5: go to Step 2 Geodesics can be computed in polynomial time: The Owen-Provan algorithm (2011)
Computing the mean: Random order version T 5 T 4 T 6 T 3 T 7 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 7 S 1 T 3 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 7 S 1 S 2 T 3 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 7 S 1 S 3 S 2 T 3 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 7 S 1 S 4 S 3 S 2 T 3 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 3 T 7 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 3 T 7 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 3 T 7 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 3 T 7 T 1 T 2
Computing the mean: Random order version T 5 T 4 T 6 T 3 T 7 T 1 T 2
Le Big Data Space T d : orthant dimension = d 2, # of orthants = (2d 3)!! The actual dimension of T d is d + 1 + (d 2)(2d 3)!! # of trees = N (e.g. coming from an MCMC simulation) Our computation: d = 12 hence dim 10 11 and N = 10 5
Le Big Data Space T d : orthant dimension = d 2, # of orthants = (2d 3)!! The actual dimension of T d is d + 1 + (d 2)(2d 3)!! # of trees = N (e.g. coming from an MCMC simulation) Our computation: d = 12 hence dim 10 11 and N = 10 5
Resuls (courtesy of Philipp Benner) Pika Rabbit Orangutan Baboon Marmoset Gorilla Human Chimpanzee Kangaroo rat Squirrel Guinea pig Rat Mouse Pika Rabbit Squirrel Orangutan Gorilla Human Chimpanzee Baboon Guinea pig Marmoset Kangaroo rat Rat Mouse Pika Rabbit Kangaroo rat Rat Mouse Squirrel Guinea pig Orangutan Baboon Marmoset Gorilla Human Chimpanzee Pika Rabbit Squirrel Kangaroo rat Guinea pig Rat Mouse Orangutan Gorilla Human Chimpanzee Baboon Marmoset
Resuls (courtesy of Philipp Benner)... continued. Pika Rabbit Baboon Marmoset Orangutan Gorilla Chimpanzee Rat Human Mouse Kangaroo rat Guinea pig Squirrel Figure: Approximation of the mean of the 100,000 trees.
References 1 M.B.: Computing medians and means in Hadamard spaces, SIAM J. Optim. 24 (2014), no. 3, 1542 1566. 2 Benner, Bacak, Bourguignon: Point estimates in phylogenetic reconstructions, Bioinformatics, Vol 30 (2014), Issue 17. 3 M.B.: Convex analysis and optimization in Hadamard spaces, De Gruyter, Berlin, 2014.