Gradient flows and a Trotter Kato formula of semi-convex functions on CAT(1)-spaces

Similar documents
Spaces with Ricci curvature bounded from below

A NOTION OF NONPOSITIVE CURVATURE FOR GENERAL METRIC SPACES

MONOTONE OPERATORS ON BUSEMANN SPACES

Local semiconvexity of Kantorovich potentials on non-compact manifolds

Convergence rate estimates for the gradient differential inclusion

Strictly convex functions on complete Finsler manifolds

LECTURE 15: COMPLETENESS AND CONVEXITY

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

Spaces with Ricci curvature bounded from below

Discrete Ricci curvature via convexity of the entropy

Banach Journal of Mathematical Analysis ISSN: (electronic)

PROXIMAL POINT ALGORITHMS INVOLVING FIXED POINT OF NONSPREADING-TYPE MULTIVALUED MAPPINGS IN HILBERT SPACES

Rigidity and Non-rigidity Results on the Sphere

Maximal monotone operators are selfdual vector fields and vice-versa

Some functional inequalities on non-reversible Finsler manifolds

Variational inequalities for set-valued vector fields on Riemannian manifolds

arxiv: v1 [math.mg] 28 Sep 2017

Pythagorean Property and Best Proximity Pair Theorems

Extending Lipschitz and Hölder maps between metric spaces

Heat flow on Alexandrov spaces

ON THE VOLUME MEASURE OF NON-SMOOTH SPACES WITH RICCI CURVATURE BOUNDED BELOW

A new notion of angle between three points in a metric space

arxiv: v1 [math.ap] 10 Oct 2013

Optimization and Optimal Control in Banach Spaces

1.4 The Jacobian of a map

Parameter Dependent Quasi-Linear Parabolic Equations

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Topology of the set of singularities of a solution of the Hamilton-Jacobi Equation

ON A MEASURE THEORETIC AREA FORMULA

OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS

Metric Spaces and Topology

FIXED POINT ITERATIONS

COMMENT ON : HYPERCONTRACTIVITY OF HAMILTON-JACOBI EQUATIONS, BY S. BOBKOV, I. GENTIL AND M. LEDOUX

Discrete Ricci curvature: Open problems

Viscosity approximation methods for the implicit midpoint rule of nonexpansive mappings in CAT(0) Spaces

Generalized metric properties of spheres and renorming of Banach spaces

An introduction to Mathematical Theory of Control

Available online at J. Nonlinear Sci. Appl., 10 (2017), Research Article

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SECOND-ORDER NONLINEAR HYPERBOLIC SYSTEM

Fixed Point Theory in Reflexive Metric Spaces

56 4 Integration against rough paths

The Lusin Theorem and Horizontal Graphs in the Heisenberg Group

HYPERBOLICITY IN UNBOUNDED CONVEX DOMAINS

ON WEAKLY NONLINEAR BACKWARD PARABOLIC PROBLEM

CONVERGENCE OF APPROXIMATING FIXED POINTS FOR MULTIVALUED NONSELF-MAPPINGS IN BANACH SPACES. Jong Soo Jung. 1. Introduction

Introduction to Gradient Flows in Metric Spaces (II)

Tools from Lebesgue integration

{Euclidean, metric, and Wasserstein} gradient flows: an overview

The small ball property in Banach spaces (quantitative results)

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

ON TWO-DIMENSIONAL MINIMAL FILLINGS. S. V. Ivanov

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

Quasistatic Nonlinear Viscoelasticity and Gradient Flows

ON THE VOLUME MEASURE OF NON-SMOOTH SPACES WITH RICCI CURVATURE BOUNDED BELOW

ALEXANDER LYTCHAK & VIKTOR SCHROEDER

Deviation Measures and Normals of Convex Bodies

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3)

Existence Results for Multivalued Semilinear Functional Differential Equations

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION

HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM

Ricci Curvature and Bochner Formula on Alexandrov Spaces

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

CONTINUITY OF CP-SEMIGROUPS IN THE POINT-STRONG OPERATOR TOPOLOGY

Course 212: Academic Year Section 1: Metric Spaces

Local null controllability of the N-dimensional Navier-Stokes system with N-1 scalar controls in an arbitrary control domain

Mass transportation methods in functional inequalities and a new family of sharp constrained Sobolev inequalities

Kramers formula for chemical reactions in the context of Wasserstein gradient flows. Michael Herrmann. Mathematical Institute, University of Oxford

Definitions of curvature bounded below

BASICS OF CONVEX ANALYSIS

FIXED POINT METHODS IN NONLINEAR ANALYSIS

THEOREMS, ETC., FOR MATH 515

CHARACTERIZATION OF (QUASI)CONVEX SET-VALUED MAPS

EXAMPLE OF A FIRST ORDER DISPLACEMENT CONVEX FUNCTIONAL

ICM 2014: The Structure and Meaning. of Ricci Curvature. Aaron Naber ICM 2014: Aaron Naber

Steepest descent method on a Riemannian manifold: the convex case

Chapter 2 Convex Analysis

Geodesics on Shape Spaces with Bounded Variation and Sobolev Metrics

Lecture 4 Lebesgue spaces and inequalities

Tobias Holck Colding: Publications. 1. T.H. Colding and W.P. Minicozzi II, Dynamics of closed singularities, preprint.

C 1 regularity of solutions of the Monge-Ampère equation for optimal transport in dimension two

arxiv: v1 [math.ap] 18 May 2017

Pogorelov Klingenberg theorem for manifolds homeomorphic to R n (translated from [4])

A LOWER BOUND ON THE SUBRIEMANNIAN DISTANCE FOR HÖLDER DISTRIBUTIONS

AW -Convergence and Well-Posedness of Non Convex Functions

Optimal transport and Ricci curvature in Finsler geometry

ON THE FILLING RADIUS OF POSITIVELY CURVED ALEXANDROV SPACES

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Non-radial solutions to a bi-harmonic equation with negative exponent

Subdifferential representation of convex functions: refinements and applications

Convex Optimization Notes

Chapter 2 Smooth Spaces

Computers and Mathematics with Applications

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

16 1 Basic Facts from Functional Analysis and Banach Lattices

Chapter 2. Metric Spaces. 2.1 Metric Spaces

A REPRESENTATION FOR THE KANTOROVICH RUBINSTEIN DISTANCE DEFINED BY THE CAMERON MARTIN NORM OF A GAUSSIAN MEASURE ON A BANACH SPACE

2 A Model, Harmonic Map, Problem

Topological properties

Transcription:

Gradient flows and a Trotter Kato formula of semi-convex functions on CAT(1)-spaces Shin-ichi Ohta & Miklós Pálfia November 0, 015 Abstract We generalize the theory of gradient flows of semi-convex functions on CAT(0)- spaces, developed by Mayer and Ambrosio Gigli Savaré, to CAT(1)-spaces. The key tool is the so-called commutativity representing a Riemannian nature of the space, and all results hold true also for metric spaces satisfying the commutativity with semi-convex squared distance functions. Our approach combining the semiconvexity of the squared distance function with a Riemannian property of the space seems to be of independent interest, and can be compared with Savaré s work on the local angle condition under lower curvature bounds. Applications include the convergence of the discrete variational scheme to a unique gradient curve, the contraction property and the evolution variational inequality of the gradient flow, and a Trotter Kato product formula for pairs of semi-convex functions. 1 Introduction The theory of gradient flows in singular spaces is a field of active research having applications in various fields. For instance, regarding heat flow as gradient flow of the relative entropy in the (L -)Wasserstein space, initiated by Jordan, Kinderlehrer and Otto [JKO], is known as a useful technique in partial differential equations (see [Ot], [Vi1], [AGS1], [ASZ] among many others), and has played a crucial role in the recent remarkable development of geometric analysis on metric measure spaces satisfying the (Riemannian) curvature-dimension condition (see [Vi], [Gi1], [GKO], [AGS], [AGS3], [EKS]). One of the recent striking achievements is Gigli s splitting theorem [Gi3] (see also a survey [Gi]), in which the gradient flow of the Busemann function is used in an impressive way. Department of Mathematics, Kyoto University, Kyoto 606-850, Japan (sohta@math.kyoto-u.ac.jp); Supported in part by the Grant-in-Aid for Young Scientists (B) 3740048. Department of Mathematics, Kyoto University, Kyoto 606-850, Japan (palfia.miklos@aut.bme.hu); Supported in part by the SGU project of Kyoto University, the JSPS international research fellowship, JSPS KAKENHI Grant No. 14F0430, the National Research Foundation of Korea (NRF) Grant No. 015R1A3A031159 funded by the Korea government (MEST), and the Hungarian National Research Fund OTKA K-115383. 1

We follow the strategy of constructing a gradient flow of a lower semi-continuous, semiconvex function ϕ on a metric space (X, d) via the discrete variational scheme employing the Moreau Yosida approximation: ϕ (x) := inf z X ϕ(z) + d (x, z) }, x X, > 0. (1.1) A point x attaining the above infimum is considered as an approximation of the point ξ() on the gradient curve ξ of ϕ with ξ(0) = x. Here ϕ is said to be semi-convex if it is λ-convex for some λ R meaning that ϕ ( γ(t) ) (1 t)ϕ ( γ(0) ) + tϕ ( γ(1) ) λ (1 t)td( γ(0), γ(1) ) along geodesics γ : [0, 1] X. Since the approximation scheme is based on the distance function, finer properties of the distance function provide finer analysis of gradient flows. In [Jo], [Ma] (with applications to harmonic maps) and [AGS1] (with applications to the Wasserstein spaces), gradient flows in CAT(0)-spaces (non-positively curved metric spaces) are well investigated; see also Bačák s recent book [Ba]. There the -convexity of the squared distance function, that is indeed the definition of CAT(0)-spaces, played essential roles. We shall generalize their theory to CAT(1)-spaces (metric spaces of sectional curvature 1), where the distance functions are only semi-convex. CAT(1)-spaces can have a more complicated global structure than CAT(0)-spaces. For instance, all CAT(0)-spaces are contractible while CAT(1)-spaces may not. It is known that the direct application of the techniques of CAT(0)-spaces to CAT(1)- spaces does not work. The point is that the K-convexity of the squared distance function with K < holds true on some non-hilbert Banach spaces, on those the behavior of gradient flows is much less understood. We overcome this difficulty by introducing the notion of commutativity representing a Riemannian nature of the space. Precisely, the key ingredients of our analysis are the following properties of CAT(1)-spaces: (A) The commutativity: d (γ(s), z) d (x, z) lim s 0 s = lim t 0 d (η(t), y) d (x, y) t (1.) for geodesics γ and η with x = γ(0) = η(0), γ(1) = y and η(1) = z (see (3.1) in the proof of Lemma 3.1); (B) The semi-convexity of the squared distance function (see Lemma.8). (One can more generally consider some other family of curves along those (A) and (B) hold, as was essentially used to study the Wasserstein spaces in [AGS1].) This approach, reinforcing the semi-convexity with the Riemannian nature of the space, seems to be of independent interest and is in a similar spirit to Savaré s work [Sa] based on the semiconcavity of distance functions and the local angle condition, those properties fit the study of spaces with lower curvature bounds. The semi-convexity and semi-concavity are usually studied in the context of Banach space theory and Finsler geometry, see [BCL],

[Oh1], [Oh3]. In the CAT(0)-setting, the commutativity follows from the -convexity of the squared distance function and the role of the commutativity is implicit. The commutativity is not true in non-riemannian Finsler manifolds (see Remark 3.(b)). Actually, the lack of the commutativity is the reason why the first author and Sturm introduced the notion of skew-convexity in [OS] to study the contraction property of gradient flows in Finsler manifolds. In contrast, on the Wasserstein space over a Riemannian manifold, we have a sort of Riemannian structure but the convexity of the squared distance function fails. See [Gi3, B] for a connection between (1.) and the infinitesimal Hilbertianity which is an almost everywhere notion of Riemannian nature. Modifying the calculation in [AGS1] with the help of the commutativity, we arrive at the key estimate for a λ-convex function ϕ and a K-convex distance (see Lemma 3.1): d (x, y) d (x, y) λd (x, y) + ϕ(y) ϕ(x )} K d (x, x ), where x is a point attaining the infimum in (1.1). Surprisingly, even with K < 0 (as well as λ < 0), this estimate is enough to generalize the argument of [AGS1]. We show the convergence of the discrete variational scheme to a unique gradient curve (Theorem 4.4), the contraction property (Theorem 4.7) and the evolution variational inequality (Theorem 4.8) of the gradient flow. Moreover, along the lines of [Ma] and [CM], we study the large time behavior of the flow ( 4.5) and prove a Trotter Kato product formula for pairs of semi-convex functions (Theorem 5.4). The latter is a two-fold generalization of the existing results in [CM], [Sto] and [Ba1] for convex functions on CAT(0)-spaces (to be precise, an inequality corresponding to our key estimate with λ = 0 and K = is an assumption of [CM]). We stress that we use only the qualitative properties of CAT(1)-spaces instead of the direct curvature condition. Thus our technique also applies to every metric space satisfying the conditions (A) and (B) above, under a mild coercivity assumption on ϕ (see Case II at the beginning of Section 4). This case could be more important than the CAT(1)-setting, because in CAT(1)-spaces the squared distance function is locally K- convex with K > 0, that makes some discussions easier with the help of the globalization technique (see 4.6). We finally mention some more related works. Gradient flow in metric spaces with lower sectional curvature bounds (Alexandrov spaces) is investigated in [PP], [Ly]. This technique was generalized to the Wasserstein spaces over Alexandrov spaces in [Oh] and [Sa]. Sturm [Stu] recently studied gradient flows in metric measure spaces satisfying the Riemannian curvature-dimension condition. Discrete-time gradient flow is also an important subject related to optimization theory, for that we refer to [OP] and the references therein. The organization of the article is as follows. In Section, we recall preliminary results on gradient flows in metric spaces from [AGS1], followed by the necessary facts of CAT(1)- spaces. Section 3 is devoted to our key estimate. We apply the key estimate to the study of gradient flows in Section 4, and prove a Trotter Kato product formula in Section 5. 3

Preliminaries Let (X, d) be a complete metric space. A curve γ : [0, 1] X is called a geodesic if it is locally minimizing and of constant speed. We call γ a minimal geodesic if it is globally minimizing, namely d(γ(s), γ(t)) = s t d(γ(0), γ(1)) for all s, t [0, 1]. We say that (X, d) is geodesic if any two points x, y X admit a minimal geodesic between them..1 Gradient flows in metric spaces We recall basic facts on the construction of gradient curves in metric spaces. We follow the technique called the minimizing movements going back to (at least) De Giorgi [DG], see [AGS1] for more on this theory. We also refer to [Br], [CL] for classical theories on linear spaces..1.1 Discrete solutions As our potential function, we always consider a lower semi-continuous function ϕ : X (, ] such that D(ϕ) := X \ ϕ 1 ( ). Given x X and > 0, we define the Moreau Yosida approximation: } ϕ (x) := inf ϕ(z) + d (x, z) z X and set J ϕ (x) := z X ϕ(z) + d (x, z) } = ϕ (x). For x D(ϕ) and z J ϕ (x) (if J ϕ (x) ), it is straightforward from ϕ(z) + d (x, z) ϕ(x) that ϕ(z) ϕ(x) and d (x, z) ϕ(x) ϕ(z)}. We consider two kinds of conditions on ϕ. Assumption.1 (1) There exists (ϕ) (0, ] such that ϕ (x) > and J ϕ (x) for all x X and (0, (ϕ)) (coercivity). () For any Q R, bounded subsets of the sub-level set x X ϕ(x) Q} are relatively compact in X (compactness). We remark that, if ϕ (x ) > for some x X and > 0, then ϕ (x) > for every x X and (0, ) (see [AGS1, Lemma..1]). Then, if the compactness () holds, we have J ϕ (x) by the lower semi-continuity of ϕ (see [AGS1, Corollary..]). Remark. If diam X < and the compactness () holds, then the lower semicontinuity of ϕ implies that every sub-level set x X ϕ(x) Q} is (empty or) compact. Thus ϕ is bounded below and we can take (ϕ) =. 4

To construct discrete approximations of gradient curves of ϕ, we consider a partition of the interval [0, ): P = 0 = t 0 < t 1 < }, lim t k =, k and set k := t k t k 1 for k N, := sup k. We will always assume < (ϕ). Given an initial point x 0 D(ϕ), x 0 := x 0 and recursively choose arbitrary x k J ϕ k (x k 1 ) for each k N. (.1) We call x k } k 0 a discrete solution of the variational scheme (.1) associated with the partition P, which is thought of as a discrete-time gradient curve for the potential function ϕ. The following a priori estimates (see [AGS1, Lemma 3..]) will be useful in the sequel. We remark that these estimates are easily obtained if ϕ is bounded below. Lemma.3 (A priori estimates) Let ϕ : X (, ] satisfy Assumption.1(1). Then, for any x X and Q, T > 0, there exists a constant C = C(x, (ϕ), Q, T ) > 0 such that, if a partition P and an associated discrete solution x k } k 0 of (.1) satisfy then we have for any 1 k N k N ϕ(x 0 ) Q, d (x 0, x ) Q, t N T, (ϕ) 8, d (x k, x ) C, k l=1 d (x l 1, x l ) l ϕ(x 0 ) ϕ(x k ) C. In particular, for all 1 k N, we have d (x k 1, x k ) C k and d (x 0, x k ) ( k l=1 d(x l 1, x l )).1. Convergence of discrete solutions k l=1 d (x l 1, x l ) l k l Ct k. (.) From here on, let ϕ : (, ] X be λ-convex (also called λ-geodesically convex) for some λ R in the sense that ϕ ( γ(t) ) (1 t)ϕ(x) + tϕ(y) λ (1 t)td (x, y) (.3) for any x, y D(ϕ) and some minimal geodesic γ : [0, 1] X from x to y. (The existence of a minimal geodesic between two points in D(ϕ) is included in the definition, thus in particular (D(ϕ), d) is geodesic.) We remark that the compactness () in Assumption.1 implies the coercivity (1) in this case (see [AGS1, Lemma.4.8]; we even have (ϕ) = if λ 0). 5 l=1

Fix an initial point x 0 D(ϕ). Take a sequence of partitions P i } i N such that lim i i = 0 and associated discrete solutions x k i } k 0 with x 0 i = x 0. Under Assumption.1(), by the compactness argument ([AGS1, Proposition..3]), a subsequence of the interpolated curves x i (0) := x 0, x i (t) := x k i for t (t k 1 i, t k i ] (.4) converges to a curve ξ : [0, ) D(ϕ) point-wise in t [0, ). In general, under the coercivity and λ-convexity of ϕ (but without the compactness), if a curve ξ is obtained as above (called a generalized minimizing movement; see [AGS1, Definition.0.6]), then it is locally Lipschitz on (0, ) and satisfies lim t 0 ξ(t) = x 0 as well as the energy dissipation identity: Here ϕ ( ξ(t ) ) = ϕ ( ξ(s) ) 1 T ξ (t) := lim s t d(ξ(s), ξ(t)) t s is the metric speed existing at almost all t, and ϕ (x) := lim sup y x S ξ + ϕ (ξ)} dt. (.5) maxϕ(x) ϕ(y), 0} d(x, y) is the (descending) local slope (see [AGS1, Theorem.4.15]). We remark that ϕ is lower semi-continuous ([AGS1, Corollary.4.10]) and lim i ϕ( x i (t)) = ϕ(ξ(t)) for all t 0 ([AGS1, Theorem.3.3]). The equation (.5) can be thought of as a metric formulation of the differential equation ξ(t) = ϕ(ξ(t)), thus ξ will be called a gradient curve of ϕ starting from x 0. We remark that one does not have uniqueness of gradient curves in this generality (see [AG, Example 4.3] for a simple example in the l -space). Remark.4 One can relax the scheme by allowing varying initial points: x 0 i x 0. Then assuming x 0 i x 0 and ϕ(x 0 i ) ϕ(x 0 ) yields the same convergence results. In our setting, such a convergence can also follow from the comparison estimate (4.) (see the proof of Theorem 4.4).. CAT(1)-spaces We refer to [BBI] for the basics of CAT(1)-spaces and for more general metric geometry. Given three points x, y, z X with d(x, y) + d(y, z) + d(z, x) < π, we can take corresponding points x, ỹ, z in the -dimensional unit sphere S (uniquely up to rigid motions) such that d S ( x, ỹ) = d(x, y), d S (ỹ, z) = d(y, z), d S ( z, x) = d(z, x). We call xỹ z a comparison triangle of xyz in S. 6

Definition.5 (CAT(1)-spaces) A geodesic metric space (X, d) is called a CAT(1)- space if, for any x, y, z X with d(x, y) + d(y, z) + d(z, x) < π and any minimal geodesic γ : [0, 1] X from y to z, we have d ( x, γ(t) ) ) d S ( x, γ(t) at all t [0, 1], where xỹ z S is a comparison triangle of xyz and γ : [0, 1] S is the minimal geodesic from ỹ to z. It is readily observed from the definition that each pair of points x, y X with d(x, y) < π is joined by a unique minimal geodesic. Fundamental examples of CAT(1)- spaces are the following. Example.6 (1) A complete, simply connected Riemannian manifold endowed with the Riemannian distance is a CAT(1)-space if and only if its sectional curvature is not greater than 1 everywhere. () All CAT(0)-spaces, which are defined similarly to Definition.5 by replacing S with R, are CAT(1)-spaces. Important examples of CAT(0)-spaces are Hadamard manifolds, Hilbert spaces, trees and Euclidean buildings. (3) Further examples of CAT(1)-spaces include orbifolds obtained as quotient spaces of CAT(1)-manifolds, and spherical buildings. See [BBI, 9.1] for more examples. Remark.7 For general κ R, CAT(κ)-spaces are defined in the same manner by employing comparison triangles in the -dimensional space form of constant curvature κ. If (X, d) is a CAT(κ)-space, then it is also CAT(κ ) for all κ > κ and the scaled metric space (X, cd) with c > 0 is CAT(c κ). Therefore considering CAT(1)-spaces covers all CAT(κ)-spaces up to scaling. As was mentioned in the introduction, the properties of CAT(1)-spaces needed in our discussion are only the semi-convexity of the squared distance function and the commutativity (1.). The latter is a consequence of the first variation formula. Let us review them. Lemma.8 (Semi-convexity of distance functions) Let (X, d) be a CAT(1)-space and take R (0, π). Then there exists K = K(R) R such that the squared distance function d (x, ) is K-convex on the open R-ball B(x, R) for all x X. Proof. By the definition of CAT(1)-spaces, it is enough to show the claim in S. Then the K-convexity is a direct consequence of the smoothness of d S ( x, ) on B( x, π). Clearly K(R) > 0 if R < π/ (see, e.g., [Oh1] for the precise estimate) and K(R) < 0 if R > π/. We can define the angle between two geodesics γ and η emanating from the same point γ(0) = η(0) = x by x (γ, η) := lim s,t 0 γ(s) x η(t), where γ(s) x η(t) is the angle at x of a comparison triangle γ(s) x η(t) in S. By the definition of the angle, we obtain the following (see [BBI, Theorem 4.5.6, Remark 4.5.1]). 7

Theorem.9 (First variation formula) Let γ : [0, 1] X be a geodesic from x to z, and take y X with 0 < d(x, y) < π. Then we have d(γ(s), y) d(x, y) lim s 0 s = d(x, z) cos x (γ, η), where η : [0, 1] X is the unique minimal geodesic from x to y. 3 Key lemma In this section, let (X, d) be a complete CAT(1)-space and ϕ : X (, ] satisfy the λ-convexity for some λ R and Assumption.1(1). Note that (.3) holds along every minimal geodesic since minimal geodesics are unique between points of distance < π. The next lemma, generalizing [AGS1, Theorem 4.1.(ii)] to the case where both λ and K can be negative, will be a key tool in the following sections. Lemma 3.1 (Key lemma) Let x D(ϕ) and (0, minπ /(C), (ϕ)/8}) with C = C(x, (ϕ), ϕ(x), (ϕ)/8) from Lemma.3. Take x J ϕ (x). Then we have, for any y D(ϕ) B(x, R d(x, x )) with R < π and for K = K(R) as in Lemma.8, d (x, y) d (x, y) λd (x, y) + ϕ(y) ϕ(x )} K d (x, x ) d (x, y) λd (x, y) + ϕ(y) ϕ(x )} + max0, K} ϕ(x) ϕ(x )}. (We remark that C = C(x, (ϕ), ϕ(x), (ϕ)/8) in the lemma means the constant C(x, (ϕ), Q, T ) from Lemma.3 with Q = ϕ(x) and T = (ϕ)/8.) Proof. Observe that d (x, x ) C < π by Lemma.3 and the choice of. Let γ : [0, 1] X be the minimal geodesic from x to y, and η : [0, 1] X from x to x. For any s (0, 1), by the definition of J ϕ (x) and the λ-convexity of ϕ, we have ϕ(x ) + d (x, x ) ϕ ( γ(s) ) + d (x, γ(s)) (1 s)ϕ(x ) + sϕ(y) λ (1 s)sd (x, y) + d (x, γ(s)). Hence ϕ(x ) ϕ(y) + 1 d (x, γ(s)) d (x, x ) λ s (1 s)d (x, y). Applying the first variation formula (Theorem.9) twice, we observe the commutativity: d (x, γ(s)) d (x, x ) lim s 0 s = d(x, x)d(x, y) cos x (γ, η) d (η(t), y) d (x, y) = lim. (3.1) t 0 t 8

Notice that η is contained in B(y, R) by the choice of y. Thus it follows from the K- convexity of d (, y) in B(y, R) that lim t 0 d (η(t), y) d (x, y) t d (x, y) d (x, y) K d (x, x ) d (x, y) d (x, y) + max0, K} ϕ(x) ϕ(x )}. This completes the proof. Remark 3. (a) Used in the proof of [AGS1, Theorem 4.1.(ii)] is the direct application of the convexity of ϕ and d (x, ) along γ, which implies in our setting K d (x, y) d (x, y) λd (x, y) + ϕ(y) ϕ(x )} d (x, x ). This coincides with our estimate when K =. The commutativity was used to move the coefficient K/ from d (x, y) to d (x, x ), then one can efficiently estimate d (x, y) d (x, y) (see Theorem 4.1). (b) As we mentioned in the introduction (see the paragraph including (1.)), the Riemannian nature of the space (i.e., the angle) is essential in the commutativity (3.1). In fact, on a Finsler manifold (M, F ), (1.) (written using only the distance) implies g v (v, w) = g w (v, w) for all v, w T x M \ 0}, x M. These notations and the basics of Finsler geometry can be found in [OS] for instance. Thus we find, for v ±w, F (v + w) + F (v w) = g v+w (v + w, v + w) + g v w (v w, v w) = g v+w (v, v + w) + g v+w (w, v + w) + g v w (v, v w) g v w (w, v w) = g v (v, v + w) + g w (w, v + w) + g v (v, v w) g w (w, v w) = g v (v, v) + g w (w, w) = F (v) + F (w). This is the parallelogram identity on T x M and hence F is Riemannian. The commutativity (1.) is the essential property connecting the (geodesic) convexity of a function and the contraction property of its gradient flow (see Theorem 4.7). On Finsler manifolds, the contraction property is characterized by the skew-convexity which is different from the usual convexity along geodesics (see [OS] for details). 4 Applications to gradient flows The estimate in Lemma 3.1 is worse than the one in [AGS1, Theorem 4.1.(ii)] because of the generality that K can be less than and even negative. Nonetheless, as we shall see in this section, Lemma 3.1 is enough to generalize the argumentation in Chapter 4 of [AGS1]. We will give at least sketches of the proofs for completeness. Our argument covers two cases. In both cases, (X, d) is complete, ϕ : X (, ] is lower semi-continuous, λ-convex and D(ϕ). 9

Case I (X, d) is a CAT(1)-space. Case II (X, d) satisfies the commutativity (3.1) and the K-convexity of the squared distance function, and ϕ satisfies the coercivity condition (Assumption.1(1)). (To be precise, the commutativity, K-convexity and λ-convexity are assumed to hold along the same family of geodesics.) We stress that both λ, K R can be negative. In Case II, the K-convexity is assumed globally, thus the assertion of Lemma 3.1 holds for any x, y D(ϕ) and (0, (ϕ)). We recall that the coercivity holds if, for instance, the compactness condition (Assumption.1()) is satisfied. In Case I, the coercivity is guaranteed by restricting ourselves to balls with radii R < π/4. In these (convex) balls the squared distance function is K-convex with K = K(R) > 0 from Lemma.8, and hence ϕ + d (x, )/() is (λ + K/())-convex. This implies that J ϕ (x) is nonempty and consists of a single point for (0, K/(λ)) even if λ < 0 (by, for example, [AGS1, Lemma.4.8]). By the same reasoning, if K > 0 in Case II, then Assumption.1(1) is redundant. To include both cases keeping clarity of the presentation, we discuss under the global K-convexity of the squared distance function and Assumption.1(1). Thus we implicitly assume diam X < π/ if we are in Case I. This costs no generality for the construction of gradient curves since it is a local problem. We explain how to extend the properties of the gradient flow to the case of diam X π/ in 4.6. 4.1 Interpolations Given an initial point x 0 D(ϕ) and a partition P with < (ϕ), we fix a discrete solution x k } k 0 of (.1). Let us also take a point y X. Similarly to Chapter 4 of [AGS1], we interpolate the discrete data x k, d(x k, y) and ϕ(x k ) as follows (recall (.4)): For t (t k 1, t k ], k N, define x (t) := x k J ϕ k (x k 1 ) ( x (0) := x 0 ), d (t; y) := d (x k 1, y) + t 1/ tk 1 d (x k, y) d (x k 1, y)}}, ϕ (t) := ϕ(x k 1 k ) + t tk 1 ϕ(x k ) ϕ(x k 1 )}. k Recall that k = t k t k 1 and note that ϕ is non-increasing. Then Lemma 3.1 yields the following discrete version of the evolution variational inequality (see [AGS1, Theorem 4.1.4]; we remark that our residual function R,K is different from that in [AGS1] and depends on K). Theorem 4.1 (Discrete evolution variational inequality) Assuming < (ϕ), we have 1 d [ d dt (t; y) ] + λ d( x (t), y ) + ϕ (t) ϕ(y) R,K (t) for almost all t (0, T ) and all y D(ϕ), where for t (t k 1, t k ] ( ) t k R,K (t) := t max0, K} + ϕ(x k 1 ) ϕ(x k )}. k 10

Proof. Note first that Lemma 3.1 is available merely under < (ϕ) in the current situation. Then we immediately obtain for t (t k 1, t k ) 1 d dt [ d (t; y) ] = d (x k, y) d (x k 1, y) k λ d (x k, y) + ϕ(y) ϕ(x k ) + = λ d( x (t), y ) + ϕ(y) ϕ (t) + tk t + This completes the proof. max0, K} ϕ(x k 1 ) ϕ(x k )}. max0, K} ϕ(x k 1 ) ϕ(x k )} k ϕ(x k 1 ) ϕ(x k )} Taking the limit in Theorem 4.1 as 0 will indeed lead to the (continuous) evolution variational inequality (Theorem 4.8). Another application of Theorem 4.1 is a comparison between two discrete solutions generated from different partitions: P = 0 = t 0 < t 1 < }, P σ = 0 = s 0 σ < s 1 σ < }. We set σ l := s l σ s l 1 σ for l N. We first observe the following modification of Theorem 4.1 (see [AGS1, Lemma 4.1.6]). Lemma 4. Suppose < (ϕ) and λ 0. Then we have 1 d [ d dt (t; y) ] + λ d (t; y) + λd (t) d (t; y) + ϕ (t) ϕ(y) R,K (t) λ D (t) for almost all t (0, T ) and all y D(ϕ), where for t (t k 1, t k ] D (t) := tk t d(x k 1, x k ). k Proof. This is a consequence of Theorem 4.1 and the inequality d ( x (t), y ) = d(x k, y) t tk 1 k d (t; y) + D (t), d(x k, y) + tk t k d(x k 1, y) + d(x k 1, x k )} which follows only from the triangle inequality and the convexity of f(r) = r, r R. Notice that Lemma 4. reduces to Theorem 4.1 when λ = 0. Applying a version of the Gronwall lemma, we obtain from Lemma 4. (or directly from Theorem 4.1 if λ = 0) a comparison estimate between x k } k 0 and yσ} l l 0 with yσ 0 = y 0 D(ϕ) (see [AGS1, Corollaries 4.1.5, 4.1.7]). For s (s l 1 σ, s l σ], we set d σ (t, s) := d (t; y l 1 σ ) + s sl 1 σ σ l 11 d (t; y l σ) d (t; y l 1 σ )}} 1/.

Observe that, for (t, s) (t k 1 d σ(t, s) = (1 t tk 1 k + t tk 1 + k (1 t tk 1 k, t k ] (s l 1 σ, s l σ], ) (1 s ) sl 1 σ (1 s sl 1 σ σ l σ l ) s s l 1 σ σ l ) d (x k, y l 1 σ ) d (x k 1 d (x k 1, yσ l 1 ), yσ) l + t tk 1 k s s l 1 σ d (x k, y σ σ). l l Corollary 4.3 (Comparison between two discrete solutions) Assume λ 0 and, σ < (ϕ). Then we have, for almost all t > 0, d [ d dt σ (t, t) ] + λ d σ(t, t) λ ( D (t) + D σ (t) ) d σ (t, t) + ( R,K (t) + R σ,k (t) ) λ ( D (t) + Dσ(t) ). (4.1) Moreover, for all T > 0, e λt d σ (T, T ) Proof. d (x 0, y 0 ) + T 0 e λt ( R,K (t) + R σ,k (t) ) λ ( D (t) + Dσ(t) )} } 1/ dt T λ e λt( D (t) + D σ (t) ) dt. (4.) 0 For each fixed s, it follows from Lemma 4. that 1 [ d t σ (t, s) ] + λ d σ(t, s) + ϕ (t) ϕ σ (s) λd (t) (1 s ) sl 1 σ d (t; yσ l 1 ) + s } sl 1 σ d (t; y l σ l σ σ) + R,K (t) λ l D (t) λd (t) d σ (t, s) + R,K (t) λ D (t). Combining this with a similar inequality 1 [ d s σ (t, s) ] + λ d σ(t, s) + ϕ σ (s) ϕ (t) λd σ (s) d σ (t, s) + R σ,k (s) λ D σ(s), we obtain the first inequality (4.1). The second assertion (4.) is a consequence of (4.1) via a version of the Gronwall lemma (see [AGS1, Lemma 4.1.8]). 4. Convergence of discrete solutions Corollary 4.3 implies that the discrete solutions x k } k 0 converges to a gradient curve as 0 and the limit curve is independent of the choice of the discrete solutions (generalizing [AGS1, Theorem 4..]). Recall.1. for properties of gradient curves. 1

Theorem 4.4 (Unique limits of discrete solutions) Fix an initial point x 0 D(ϕ) and consider discrete solutions x k i } k 0 with x 0 i = x 0 associated with a sequence of partitions P i } i N such that lim i i = 0. Then the interpolated curve x i : [0, ) X as in 4.1 converges to a curve ξ : [0, ) X with ξ(0) = x 0 as i uniformly on each bounded interval [0, T ]. In particular, the limit curve ξ is independent of the choices of the sequence of partitions and discrete solutions. Proof. Since (X, d) is complete, it is sufficient to show d i j (t, t) 0 as i, j uniformly on [0, T ], in the notations of Corollary 4.3. Let λ < 0 without loss of generality. Take i 0 large enough to satisfy i (ϕ)/8 for all i i 0, and put K := min0, K} 0. The integral of R,K (recall Theorem 4.1 for the definition) is calculated as t k t k 1 R,K dt = ( ) 1 K k ϕ(x k 1 ) ϕ(x k )}. Similarly, together with the canonical estimate d (x k 1, x k ) k ϕ(x k 1 ) ϕ(x k )}, we have 0 t k This also implies t k ( t k e λt D (t) dt 0 t k 1 D dt k 3 kϕ(x k 1 ) ϕ(x k )}. ) 1/ ( t k 1/ e λt dt D dt) 1 0 λ 3 ϕ(x 0 ) ϕ(x k ). Combining these with (4.), we obtain for i, j i 0, t t k i T and t t l j T, t e λt di j (t, t) (R i,k + R j,k) λ(d i + D j )} ds 0 t λ e λs( D i (s) + D j (s) ) ds 0 ( 1 K λ ) 3 i i ϕ(x 0 ) ϕ(x k i )} + ( + 1 K λ ) 3 j 4λ 3 Thanks to the a priori estimate (Lemma.3), we have } 1/ } 1/ j ϕ(x 0 ) ϕ(x l j )} ( ) i ϕ(x 0 ) ϕ(x k i ) + j ϕ(x 0 ) ϕ(x l j ). maxϕ(x 0 ) ϕ(x k i ), ϕ(x 0 ) ϕ(x l j )} C = C(x 0, (ϕ), ϕ(x 0 ), T ). Therefore we conclude that d i j (t, t) tends to 0 as i, j, uniformly in t [0, T ]. By virtue of the uniqueness, we can define the gradient flow operator G : [0, ) D(ϕ) D(ϕ) (4.3) 13

by G(t, x 0 ) := ξ(t), where ξ : [0, ) X is the unique gradient curve with ξ(0) = x 0 given by Theorem 4.4. Then the semigroup property: G ( t, G(s, x 0 ) ) = G(s + t, x 0 ) for all s, t 0 also follows from the uniqueness of gradient curves. One can immediately obtain the following (rough) error estimate from the proof of Theorem 4.4 (compare this with [AGS1, Theorem 4.0.9]). Corollary 4.5 (An error estimate) Let λ 0 and < (ϕ), fix x 0 D(ϕ), and put ξ(t) := G(t, x 0 ). Then we have d ( t; ξ(t) ) e λt ( 1 K λ3 + 4λ 3 ) ϕ(x 0 ) ϕ ( x (t) )} for all t > 0, where K := min0, K}. Proof. Taking the limit of (4.) as σ 0 and using the estimates in the proof of Theorem 4.4, we have for t (t k 1, t k ] ( ) t 1/ e λt d t; ξ(t) (R,K λd ) ds} λ 0 t e λs D (s) ds 0 } 1/ (1 K λ3 ) ϕ(x 0 ) ϕ(x k )} + 4λ 3 ϕ(x 0 ) ϕ(x k ) = ( 1 K λ3 ) + 4λ 3 ϕ(x 0 ) ϕ ( x (t) ). This completes the proof. 4.3 Contraction property Coming back to the discrete scheme, we show the following lemma, which readily implies the contraction property of G (Theorem 4.7). Set λ := log(1 + λ ) assuming λ > 1 (if λ < 0), and observe that λ λ and for all k N (see [AGS1, Lemma 3.4.1]). log(1 + λ k ) k λ (4.4) 14

Lemma 4.6 (Discrete contraction estimate) Take x 0, y 0 D(ϕ) and P with < (ϕ)/8. Assume λ > 1 if λ < 0. Then we have, for t (t k 1, t k ] with t k T, e (λ t+λ ) d ( x (t), ȳ (t) ) d (x 0, y 0 ) + e λ+ t k ϕ(y0 ) ϕ(y k )} K e λ+ t k ϕ(x0 ) ϕ(x k ) + ϕ(y 0 ) ϕ(y k )} + O x0,y 0,T ( ), where K := min0, K}, λ := min0, λ } and λ + := max0, λ }. Proof. The proof is along the line of [AGS1, Lemma 4..4]. Applying Lemma 3.1, we have for each k N and Thus we have Note that d (x k, y k 1 ) d (x k 1, y k 1 ) λ k d (x k, y k 1 ) + k ϕ(y k 1 ) ϕ(x k )} K k ϕ(x k 1 ) ϕ(x k )}, d (x k, y k ) d (x k, y k 1 ) λ k d (x k, y k ) + k ϕ(x k ) ϕ(y k )} K k ϕ(y k 1 ) ϕ(y k )}. (1 + λ k )d (x k, y k ) d (x k 1 d (x k, y k 1 ) d (x k 1, y k 1 K k ϕ(x k 1, y k 1 ) λ k d (x k, y k 1 ) + k ϕ(y k 1 ) ϕ(y k )} ) ϕ(x k ) + ϕ(y k 1 ) ϕ(y k )}. ) d(x k, y k 1 = O x0,y 0,T ( ) ) + d(x k 1, y k 1 )}d(x k, x k 1 ) by the a priori estimate (Lemma.3, (.)). Together with 1 λ k (1 + λ k ) 1, this implies (1 + λ k )d (x k, y k ) Multiplying both sides by e λ (tk 1 1 d (x k 1, y k 1 ) + k ϕ(y k 1 ) ϕ(y k )} 1 + λ k K k ϕ(x k 1 + k O x0,y 0,T ( ). ) ϕ(x k ) + ϕ(y k 1 ) ϕ(y k )} + k ) = e λ (tk k ) yields that, since by (4.4), (1 + λ k )e λ k e λ t k e λ t k, e λ t k 1 e λ k 1 + λ k e λ t k 1 e λ t k d (x k, y k ) e λ t k 1 d (x k 1 K e λ (t k 1 + k O x0,y 0,T ( )., y k 1 ) + e λ (t k 1 +t k ) k ϕ(y k 1 ) ϕ(y k )} +t k ) k ϕ(x k 1 ) ϕ(x k ) + ϕ(y k 1 ) ϕ(y k )} 15

Summing up, for t (t k 1, t k ], we obtain the desired estimate e (λ t+λ ) d ( x (t), ȳ (t) ) e λ t k d (x k, y k ) d (x 0, y 0 ) + e λ+ t k ϕ(y0 ) ϕ(y k )} K e λ+ t k ϕ(x0 ) ϕ(x k ) + ϕ(y 0 ) ϕ(y k )} + t k O x0,y 0,T ( ). Theorem 4.7 (Contraction property) Take x 0, y 0 D(ϕ) and put ξ(t) := G(t, x 0 ) and ζ(t) := G(t, y 0 ). Then we have, for any t > 0, d ( ξ(t), ζ(t) ) e λt d(x 0, y 0 ). (4.5) Proof. Take the limit as 0 in Lemma 4.6. Then the claim follows from lim 0 λ = λ and the a priori estimate in Lemma.3 (which bounds ϕ(x 0 ) ϕ(x k )). The contraction property allows us to take the continuous limit G : [0, ) D(ϕ) D(ϕ) of the gradient flow operator in (4.3), which again enjoys the semigroup property as well as the contraction property (4.5). One can alternatively derive the contraction property from the evolution variational inequality (4.6) below, whereas we think that this direct proof and the discrete estimate in Lemma 4.6 are worthwhile as well. 4.4 Evolution variational inequality Similarly to [AGS1, Theorem 4.3.], taking the limit of Theorem 4.1, we obtain the following. Theorem 4.8 (Evolution variational inequality) Take x 0 D(ϕ) and put ξ(t) := G(t, x 0 ). Then we have lim sup ε 0 d (ξ(t + ε), y) d (ξ(t), y) ε for all y D(ϕ) and t > 0. In particular, + λ d( ξ(t), y ) + ϕ ( ξ(t) ) ϕ(y) (4.6) 1 d [ ( d ξ(t), y )] + λ dt d( ξ(t), y ) + ϕ ( ξ(t) ) ϕ(y) for all y D(ϕ) and almost all t > 0. Proof. By recalling the estimate of the integral of R,K in the proof of Theorem 4.4, integration in t [S, T ] of Theorem 4.1 gives 1 d (T ; y) 1 T d λ (S; y) + S d( x (t), y ) } + ϕ (t) dt ( ) 1 (T S)ϕ(y) + K ϕ ( x (S ) ) ϕ ( x (T ) )}. 16

Note that ϕ is uniformly bounded on [0, T ] thanks to the a priori estimate (Lemma.3). Thus we have T T T ϕ ξ dt lim inf ϕ dt lim inf ϕ dt 0 0 S S by the lower semi-continuity of ϕ and Fatou s lemma. Therefore letting 0 shows the integrated form of the evolution variational inequality: d (ξ(t ), y) d (ξ(s), y) T λ + d( ξ(t), y ) + ϕ ( ξ(t) ) } dt (T S)ϕ(y). S Dividing both sides by T S and letting T S 0, we obtain the desired inequality by the lower semi-continuity of ϕ. (We remark that ϕ ξ is in fact continuous; see [AGS1, Theorem.4.15].) 4.5 Stationary points and large time behavior of the flow In this subsection, following the argumentation in [Ma] (on CAT(0)-spaces), we study stationary points and the large time behavior of the gradient flow G. Since the fundamental properties of the flow, for establishing those the CAT(0)-property is used in [Ma, Section 1], is already in hand, we can follow the line of [Ma, Section ] almost verbatim. We begin with a consequence of the evolution variational inequality (Theorem 4.8) corresponding to [Ma, Lemma.8]. Lemma 4.9 Take x 0 D(ϕ) and put ξ(t) := G(t, x 0 ). Then we have d ( ξ(t ), y ) T e λt d (x 0, y) + e λt e λt ϕ(y) ϕ ( ξ(t) )} dt for all T > 0 and y D(ϕ). In particular, we have d ( ξ(t ), y ) e λt d (x 0, y) (1 e λt ) ( ) } ϕ ξ(t ) ϕ(y), λ where (1 e λt )/λ := T when λ = 0. 0 S Proof. Rewrite (4.6) as lim sup ε 0 e λ(t+ε) d (ξ(t + ε), y) e λt d (ξ(t), y) ε e λt ϕ(y) ϕ ( ξ(t) )}, which implies the first claim. The second claim readily follows from this and the fact that ϕ(ξ(t)) is non-increasing in t. By the above lemma, one can show a characterization of stationary points of the flow G in terms of the local slope ϕ. Theorem 4.10 (A characterization of stationary points) A point x 0 D(ϕ) satisfies ϕ (x 0 ) = 0 if and only if G(t, x 0 ) = x 0 for all t > 0. 17

Proof. The only if part is a consequence of [Ma, Lemma.11] asserting that ϕ(x 0 ) ϕ(x) ϕ (x 0 ) = 0 if and only if sup x x 0 d (x 0, x) <, (4.7) for which only the λ-convexity of ϕ is used. The if part follows from the same relation (4.7) and Lemma 4.9, noticing that (1 e λt )/λ T for λ < 0. See [Ma, Theorem.1] for details. It is natural to expect that, if ϕ ξ does not diverge to, then ϕ ξ tends to 0. This is indeed the case as follows. Lemma 4.11 Assume λ 0, take x 0 D(ϕ) and put ξ(t) := G(t, x 0 ). Then we have for all 0 < S < T. ϕ ( ξ(t ) ) ϕ ( ξ(s) ) T λ ϕ ξ dt (4.8) S Proof. For any x D(ϕ) and x J ϕ (x), it follows from the λ-convexity of ϕ that ϕ (x ) ϕ (x) λd(x, x ) (see [Ma, Lemma.3]). Substituting d (x, x ) ϕ(x) ϕ(x )} and iterating this estimate, one finds ϕ (x N ) ϕ (x 0 ) λ Applying the Cauchy Schwarz inequality: k ϕ(x k 1 ) ϕ(x k )} and taking the limit as 0, we have N k ϕ(x k 1 ) ϕ(x k )}. k ϕ ( ξ(t ) ) ϕ (x 0 ) λ T N ϕ(x k 1 ) ϕ(x k )} ϕ(x 0 ) ϕ ( ξ(t ) ) (4.9) for all T > 0, since ϕ and ϕ are lower semi-continuous (by [Ma, Proposition.5] or [AGS1, Corollary.4.10]). By replacing x 0 with ξ(s), the implication from (4.9) to (4.8) is the same as [Ma, Lemma.7]. Theorem 4.1 (Large time behavior) Take x 0 D(ϕ), put ξ(t) := G(t, x 0 ) and assume lim t ϕ(ξ(t)) >. Then we have lim t ϕ (ξ(t)) = 0. Proof. The proof is done by contradiction with the help of the estimate (4.8) and the right continuity of ϕ ξ (see [Ma, Corollary.8] or [AGS1, Theorem.4.15]). We refer to [Ma, Theorem.30] for details. 18

The following corollary is immediate, see [Ma, Corollary.31]. Corollary 4.13 Take x 0 D(ϕ), put ξ(t) := G(t, x 0 ) and assume that there is a sequence t n } n N such that lim n t n = and ξ(t n )} n N converges to a point x. Then x is a stationary point of ϕ (in the sense of Theorem 4.10) and lim t ϕ(ξ(t)) = ϕ( x). In general, lim t ϕ (ξ(t)) = 0 does not imply the convergence to a stationary point. One needs some compactness condition to find a stationary point, see for instance [Ma, Theorem.3]. 4.6 The case of CAT(1)-spaces with diameter π/ All the results in this section are generalized to complete CAT(1)-spaces (X, d) with diam X π/. First of all, given x 0 D(ϕ), one can restrict the construction of the gradient curve in, say, the open ball B(x 0, π/6). Since B(x 0, π/6) is (geodesically) convex, the squared distance function in this ball is K-convex with K = K(π/3) > 0 from Lemma.8, and we obtain the gradient curve ξ with ξ(0) = x 0. Once ξ(t) hits the boundary B(x 0, π/6) at t = t 1, we restart the construction in B(ξ(t 1 ), π/6). We remark that t 1 (π/6) /(C) by (.). Iterating this procedure gives the gradient curve ξ : [0, ) X. The contraction property and the evolution variational inequality are globalized in a standard way as follows. (Then Theorems 4.10, 4.1 also hold true since they are based only on the evolution variational inequality.) For the contraction property (Theorem 4.7), if d(x 0, y 0 ) π/, then we consider a minimal geodesic γ from x 0 to y 0 and choose points z 0 = x 0, z 1,..., z m 1, z m = y 0 on γ such that max 1 l m d(z l 1, z l ) πe λ T / for given T > 0. Applying Theorem 4.7 to adjacent gradient curves ξ l := G(, z l ) shows d(ξ l 1 (t), ξ l (t)) e λt d(z l 1, z l ) for t [0, T ]. This yields d(ξ(t), ζ(t)) e λt d(x 0, y 0 ) by the triangle inequality. For the evolution variational inequality (Theorem 4.8), given a minimal geodesic γ : [0, 1] X from ξ(t) to y, it is easy to see that (4.6) for y = γ(s) with small s > 0 (so that d(ξ(t), γ(s)) π/) implies (4.6) itself. Indeed, since d (ξ(t + ε), y) d (ξ(t), y) ε d(ξ(t + ε), γ(s)) + d(γ(s), y)} d(ξ(t), γ(s)) + d(γ(s), y)} ε = d (ξ(t + ε), γ(s)) d (ξ(t), γ(s)) d(ξ(t + ε), γ(s)) d(ξ(t), γ(s)) + ε ε = d (ξ(t + ε), γ(s)) d (ξ(t), γ(s)) ε ( 1 + d(γ(s), y) d(ξ(t + ε), γ(s)) + d(ξ(t), γ(s)) d ( γ(s), y ) ), 19

we obtain from (4.6) with y = γ(s) that d (ξ(t + ε), y) d (ξ(t), y) lim sup ε 0 ε λ d( ξ(t), γ(s) ) ϕ ( ξ(t) ) + ϕ ( γ(s) )} ( ) d(γ(s), y) 1 + d(ξ(t), γ(s)) 1 λ s s d ( ξ(t), y ) ϕ ( ξ(t) ) + (1 s)ϕ ( ξ(t) ) + sϕ(y) λ (1 s)sd( ξ(t), y )} = λ d( ξ(t), y ) ϕ ( ξ(t) ) + ϕ(y). 5 A Trotter Kato product formula This final section is devoted to a further application of our key lemma: a Trotter Kato product formula for semi-convex functions. See [KM] for the classical setting of convex functions on Hilbert spaces. The Trotter Kato product formula on metric spaces was established by Stojkovic [Sto] for convex functions on CAT(0)-spaces in terms of ultralimits (see also a recent result [Ba1] in terms of weak convergence), and by Clément and Maas [CM] for functions satisfying the assertion of our key lemma (Lemma 3.1) with K = and λ = 0 (thus including convex functions on CAT(0)-spaces). We stress that, similarly to the previous section, both the squared distance function and potential functions are allowed to be semi-convex in our argument. 5.1 Setting and the main theorem Assumption 5.1 Let (X, d) be a complete metric space in either Case I or Case II (see the beginning of Section 4), and assume additionally D := diam X <. For i = 1,, we consider a lower semi-continuous, λ i -convex function ϕ i : X (, ] (λ i R) satisfying D(ϕ 1 ) D(ϕ ) and the compactness (Assumption.1()). We remark that λ i can be negative. The sum ϕ := ϕ 1 + ϕ is clearly lower semicontinuous, (λ 1 + λ )-convex and enjoys Assumption.1() (with (ϕ) = ) since ϕ i is bounded below (Remark.) and x X ϕ(x) Q} x X ϕ 1 (x) Q inf X ϕ }. Given z 0 D(ϕ) = D(ϕ 1 ) D(ϕ ) and a partition P, we consider the discrete variational schemes for ϕ 1 and ϕ in turn, namely z 0 := z 0, choose arbitrary ẑ k J ϕ 1 k (z k 1 ) and then z k J ϕ k (ẑ k ) for k N. (5.1) If ϕ 1 = ϕ, then this scheme reduces to (.1) for ϕ with respect to the partition: } 0 < t1 < t1 < t1 + t < t <. 0

The Trotter Kato product formula asserts that z k } k 0 converges to the gradient curve of ϕ emanating from z 0 in an appropriate sense. This is useful when ϕ 1 and ϕ are easier to handle than their sum ϕ. An additional difficulty (in the discrete scheme) compared with the direct variational approximation for ϕ is that we have a priori no control of ϕ (ẑ k ) ϕ (z k 1 ) and ϕ 1 (z k ) ϕ 1 (ẑ k ) (both are being nonpositive if ϕ 1 = ϕ ). Thus we suppose the following condition: Assumption 5. Given z 0 D(ϕ) and a partition P, set δ k (z 0 ) := max0, ϕ (ẑ k ) ϕ (z k 1 ), ϕ 1 (z k ) ϕ 1 (ẑ k )} for k N by suppressing the dependence on the choice of ẑ k, z k } k N. Assume that, for any ε, T > 0, there is T ε (z 0 ) < such that δ k (z 0 ) T ε (z 0 ) for any P with < ε, N N with t N T, and for any solution ẑ k, z k } k N to (5.1). This in particular guarantees that ẑ k D(ϕ) and z k D(ϕ). Example 5.3 One of the simplest examples satisfying Assumption 5. is pairs of Lipschitz functions. If both ϕ 1 and ϕ are L-Lipschitz, then d (z k 1, ẑ k ) k ϕ 1 (z k 1 ) ϕ 1 (ẑ k )} k Ld(z k 1, ẑ k ). Hence d(z k 1, ẑ k ) L k and similarly d(ẑ k, z k ) L k. Thus we find δ k (z 0 ) L maxd(z k 1, ẑ k ), d(ẑ k, z k )} L t N. Notice that T ε (z 0 ) is taken independently from ε and z 0 in this case. See [CM, Proposition 1.7] for other examples. Our assumptions are comparable with those in [CM]. (For the sake of simplicity, we do not intend to minimize the assumptions in this section.) To state the main theorem of the section, we introduce the interpolated curve z similarly to 4.1: z (0) := z 0, z (t) := z k for t (t k 1, t k ]. Theorem 5.4 (A Trotter Kato product formula) Let Assumptions 5.1, 5. be satisfied. Given z 0 D(ϕ), the curve z converges to the gradient curve ξ := G(, z 0 ) of ϕ (constructed in the previous section) as 0 uniformly on each bounded interval [0, T ]. Similarly to the previous section, we will discuss under the global K-convexity of the squared distance function. Thus diam X < π is implicitly assumed in Case I, however, this costs no generality as we explained in 4.6. 1

5. Preliminary estimates Fix z 0 D(ϕ), P with < ε and ẑ k, z k } k N solving (5.1). Lemma 5.5 (i) For each N N with t N T, we have max ϕ i(z N ) ϕ i (z 0 ), ϕ i (ẑ N ) ϕ i (z 0 )} δ k (z 0 ) T ε (z 0 ). i=1, (ii) For any l k with t k T, we have maxd(z l 1, ẑ k ), d(z l 1, z k ), d(ẑ l, ẑ k ), d(ẑ l, z k )} (t k t l 1 ) ϕ1 (z 0 ) inf X ϕ 1 + T ε (z 0 ) + ϕ (z 0 ) inf X ϕ + T ε (z 0 ) Proof. (i) This is straightforward from the definition of δ k (z 0 ). We know ϕ 1 (ẑ k ) ϕ 1 (z k 1 ) and hence ϕ 1 (z N ) = ϕ 1 (z 0 ) + ϕ 1 (z k ) ϕ 1 (ẑ k ) + ϕ 1 (ẑ k ) ϕ 1 (z k 1 )} ϕ 1 (z 0 ) + δ k (z 0 ). Similarly we find ϕ 1 (ẑ N ) ϕ 1 (z 0 ) + N 1 δk (z 0 ) and ϕ (z N ) ϕ (ẑ N ) ϕ (z 0 ) + δ k (z 0 ). (ii) It follows from the Cauchy Schwarz inequality that k maxd(z l 1, ẑ k ), d(z l 1, z k ), d(ẑ l, ẑ k ), d(ẑ l, z k )} d(z m 1, ẑ m ) + d(ẑ m, z m )} k k d (z m 1, ẑ m ) m + k d (ẑ m, z m ) m m m=l m=l m=l (t k t l 1 ) k ϕ 1 (z m 1 ) ϕ 1 (ẑ m )} + k ϕ (ẑ m ) ϕ (z m )}. m=l Note that, by (i), k ϕ 1 (z m 1 ) ϕ 1 (ẑ m )} m=l m=l m=l k ϕ 1 (z m 1 ) ϕ 1 (z m ) + δ m (z 0 )} m=l = ϕ 1 (z l 1 ) ϕ 1 (z k ) + k δ m (z 0 ) m=l ϕ 1 (z 0 ) inf X ϕ 1 + T ε (z 0 ). Similarly we obtain k m=l ϕ (ẑ m ) ϕ (z m )} ϕ (z 0 ) inf X ϕ + T ε (z 0 ). This completes the proof. }.

Thanks to (ii) above, as 0, we obtain the uniform convergence of a subsequence of z : [0, T ] X to a Hölder continuous curve ζ : [0, T ] X with ζ(0) = z 0. Our goal is to show that ζ coincides with the gradient curve ξ of ϕ. Then z uniformly converges to ξ as 0 without passing to subsequences. Observe that the uniformity can be seen by contradiction; the existence of ρ > 0 such that sup t [0,T ] d( z i (t), ξ(t)) ρ for all i with lim i 0 i 0 contradicts the uniform convergence of a subsequence of z i } i N. The following key estimate can be thought of as a discrete version of the evolution variational inequality (compare this with [CM, Lemma.1]). Lemma 5.6 Assume λ i > 1 for i = 1,. For any w D(ϕ) and k N with t k T, we have e (λ 1 +λ )tk d (z k, w) e (λ 1 +λ )tk 1 where K := min0, K} and d (z k 1, w) + e λ tk 1 +λ 1 tk k ϕ(w) ϕ(z k ) + δ k (z 0 )} K e λ tk 1 +λ 1 tk k ϕ(z k 1 ) ϕ(z k ) + δ k (z 0 )} + k O z0,ε,t ( k ), λ i := log(1 + λ i ), i = 1,. Proof. The proof is based on calculations similar to Lemma 4.6. Applying Lemma 3.1 to the steps z k 1 ẑ k and ẑ k z k, we have (1 + λ 1 k )d (ẑ k, w) d (z k 1, w) k ϕ 1 (w) ϕ 1 (ẑ k )} K k ϕ 1 (z k 1 ) ϕ 1 (ẑ k )}, (1 + λ k )d (z k, w) d (ẑ k, w) k ϕ (w) ϕ (z k )} K k ϕ (ẑ k ) ϕ (z k )}. Thus we find (1 + λ k )d (z k, w) d (z k 1, w) λ 1 k d (ẑ k, w) + k ϕ(w) ϕ 1 (ẑ k ) ϕ (z k )} K k ϕ 1 (z k 1 ) ϕ 1 (ẑ k ) + ϕ (ẑ k ) ϕ (z k )}. Note that, by Lemma 5.5(i), Moreover, and similarly d (ẑ k, w) d (z k 1, w) d(ẑ k, w) + d(z k 1, w)}d(ẑ k, z k 1 ) D k ϕ 1 (z k 1 ) ϕ 1 (ẑ k )} D k ϕ1 (z 0 ) + T ε (z 0 ) inf X ϕ 1 = O z0,ε,t ( k ). ϕ(w) ϕ 1 (ẑ k ) ϕ (z k ) ϕ(w) ϕ(z k ) + δ k (z 0 ) ϕ 1 (z k 1 ) ϕ 1 (ẑ k ) + ϕ (ẑ k ) ϕ (z k ) ϕ(z k 1 ) ϕ(z k ) + δ k (z 0 ). Combining these yields (1 + λ k )d (z k 1, w) d (z k 1, w) + k ϕ(w) ϕ(z k ) + δ k (z 0 )} 1 + λ 1 k K k ϕ(z k 1 ) ϕ(z k ) + δ k (z 0 )} + k O z0,ε,t ( k ). Multiply both sides by e λ tk 1 +λ 1 tk = e (λ 1 +λ )tk 1 +λ 1 k = e (λ 1 +λ )tk λ k. Then, recalling (4.4), we obtain the desired estimate. 3

5.3 Proof of Theorem 5.4 By Lemma 5.6, d (z k, w) d (z k 1, w) k = e(λ 1 +λ )tk d (z k, w) e (λ 1 +λ )t k 1 d (z k 1, w) + e (λ e (λ 1 +λ )tk k 1 +λ ) k 1 k d (z k 1, w) e λ k ϕ(w) ϕ(z k ) + δ k (z 0 )} K e λ k ϕ(z k 1 ) ϕ(z k ) + δ k (z 0 )} + e (λ 1 +λ ) k 1 d (z k 1, w) + O( ) k = ϕ(w) ϕ(z k ) K ϕ(zk 1 ) ϕ(z k )} λ 1 + λ d (z k 1, w) + (1 K )δ k (z 0 ) + O( ). We used the bound of ϕ(z k ) (Lemma 5.5(i)) to estimate the error terms. Denote by x k } k 0 a discrete solution of the variational scheme (.1) for ϕ with x 0 = z 0. We recall from the proof of Theorem 4.1 that, putting λ := λ 1 + λ, d (x k, y) d (x k 1, y) ϕ(y) ϕ(x k ) K k ϕ(xk 1 ) ϕ(x k )} λ d (x k, y). Applying these inequalities with w = x k 1 and y = z k to d (x N, z N ) = d (x k, z k ) d (x k 1, z k ) + d (x k 1, z k ) d (x k 1, z k 1 ) }, we obtain for N with t N T d (x N, z N ) k ϕ(z k ) ϕ(x k ) K ϕ(xk 1 ) ϕ(x k )} λ } d (x k, z k ) + k ϕ(x k 1 ) ϕ(z k ) K ϕ(zk 1 ) ϕ(z k )} λ } d (x k 1, z k 1 ) + k (1 K )δ k (z 0 ) + t N O( ) ( K ) λ k ϕ(x k 1 ) ϕ(x k )} K k ϕ(z k 1 ) ϕ(z k )} k d (x k 1, z k 1 ) + d (x k, z k )} + (1 K ) T ε (z 0 ) + t N O( ). 4

Notice that k ϕ(x k 1 ) ϕ(x k )} Moreover, since ϕ(z k 1 ) ϕ(z k ) δ k (z 0 ), k ϕ(z k 1 ) ϕ(z k )} } ϕ(x k 1 ) ϕ(x k )} ϕ(z 0 ) inf ϕ. X ϕ(z k 1 ) ϕ(z k ) + δ k (z 0 )} } ϕ(z 0 ) inf ϕ + X T ε (z 0 ). Therefore, letting i in the sequence P i for which z i converges to ζ, we find d ( ξ(t ), ζ(t ) ) T λ d (ξ, ζ) dt. 0 This implies that the nonnegative function f(t ) := T 0 d (ξ, ζ) dt satisfies f λf and hence (e λt f(t )) 0. Thus f 0 and we complete the proof of ζ = ξ and Theorem 5.4 (recall the paragraph following Lemma 5.5). References [AG] L. Ambrosio and N. Gigli, A user s guide to optimal transport, Modeling and optimisation of flows on networks, 1 155, Lecture Notes in Math., 06, Springer, 013. [AGS1] L. Ambrosio, N. Gigli and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, Birkhäuser Verlag, Basel, 005. [AGS] L. Ambrosio, N. Gigli and G. Savaré, Calculus and heat flow in metric measure spaces and applications to spaces with Ricci bounds from below, Invent. Math. 195 (014), 89 391. [AGS3] L. Ambrosio, N. Gigli and G. Savaré, Metric measure spaces with Riemannian Ricci curvature bounded from below, Duke Math. J. 163 (014), 1405 1490. [ASZ] [Ba1] [Ba] [BCL] L Ambrosio, G. Savaré and L. Zambotti, Existence and stability for Fokker-Planck equations with log-concave reference measure, Probab. Theory Related Fields 145 (009), 517 564. M. Bačák, A new proof of the Lie Trotter Kato formula in Hadamard spaces, Commun. Contemp. Math. 16 (014), 1350044, 15 pp. M. Bačák, Convex analysis and optimization in Hadamard spaces, Walter de Gruyter & Co., Berlin, 014. K. Ball, E. A. Carlen and E. H. Lieb, Sharp uniform convexity and smoothness inequalities for trace norms, Invent. Math. 115 (1994), 463 48. 5

[Br] [BBI] [CM] [CL] [DG] [EKS] [Gi1] [Gi] [Gi3] [GKO] [JKO] [Jo] [KM] H. Brézis, Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert (French), North-Holland Publishing Co., Amsterdam-London; American Elsevier Publishing Co., Inc., New York, 1973. D. Burago, Yu. Burago and S. Ivanov, A course in metric geometry, American Mathematical Society, Providence, RI, 001. P. Clément and J. Maas, A Trotter product formula for gradient flows in metric spaces, J. Evol. Equ. 11 (011), 405 47. Erratum in J. Evol. Equ. 13 (013), 51 5. M. G. Crandall and T. M. Liggett, Generation of semi-groups of nonlinear transformations on general Banach spaces, Amer. J. Math. 93 (1971), 65 98. E. De Giorgi, New problems on minimizing movements, Boundary value problems for partial differential equations and applications, 81 98, RMA Res. Notes Appl. Math., 9, Masson, Paris, 1993. (Also in: E. De Giorgi, Selected papers, 699 713, Edited by Luigi Ambrosio et al, Springer-Verlag, Berlin, 006.) M Erbar, K. Kuwada and K.-T. Sturm, On the equivalence of the entropic curvaturedimension condition and Bochner s inequality on metric measure spaces, Invent. Math. 01 (015), 993 1071. N. Gigli, On the heat flow on metric measure spaces: existence, uniqueness and stability, Calc. Var. Partial Differential Equations 39 (010), 101 10. N. Gigli, An overview of the proof of the splitting theorem in spaces with non-negative Ricci curvature, Anal. Geom. Metr. Spaces (014), 169 13. N. Gigli, The splitting theorem in non-smooth context, Preprint (013). Available at arxiv:130.5555 N. Gigli, K. Kuwada and S. Ohta, Heat flow on Alexandrov spaces, Comm. Pure Appl. Math. 66 (013), 307 331. R. Jordan, D. Kinderlehrer and F. Otto, The variational formulation of the Fokker Planck equation, SIAM J. Math. Anal. 9 (1998), 1 17. J. Jost, Convex functionals and generalized harmonic maps into spaces of nonpositive curvature, Comment. Math. Helv. 70 (1995), 659 673. T. Kato and K. Masuda, Trotter s product formula for nonlinear semigroups generated by the subdifferentials of convex functionals, J. Math. Soc. Japan 30 (1978), 169 178. [Ly] A. Lytchak, Open map theorem for metric spaces, St. Petersburg Math. J. 17 (006), 477 491. [Ma] U. F. Mayer, Gradient flows on nonpositively curved metric spaces and harmonic maps, Comm. Anal. Geom. 6 (1998), 199 53. [Oh1] S. Ohta, Convexities of metric spaces, Geom. Dedicata 15 (007), 5 50. [Oh] S. Ohta, Gradient flows on Wasserstein spaces over compact Alexandrov spaces, Amer. J. Math. 131 (009), 475 516. 6