On damped second-order gradient systems

Similar documents
1 Lyapunov theory of stability

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems

An introduction to Mathematical Theory of Control

Convergence Rate of Nonlinear Switched Systems

Optimization and Optimal Control in Banach Spaces

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

UNIQUENESS OF POSITIVE SOLUTION TO SOME COUPLED COOPERATIVE VARIATIONAL ELLIPTIC SYSTEMS

Topological properties

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Part III. 10 Topological Space Basics. Topological Spaces

Convex Functions and Optimization

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal

Elements of Convex Optimization Theory

2. The Concept of Convergence: Ultrafilters and Nets

Least Sparsity of p-norm based Optimization Problems with p > 1

Equilibria with a nontrivial nodal set and the dynamics of parabolic equations on symmetric domains

Math Tune-Up Louisiana State University August, Lectures on Partial Differential Equations and Hilbert Space

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

Douglas-Rachford splitting for nonconvex feasibility problems

Introduction to Real Analysis Alternative Chapter 1

Output Input Stability and Minimum-Phase Nonlinear Systems

Integral Jensen inequality

SYMMETRY RESULTS FOR PERTURBED PROBLEMS AND RELATED QUESTIONS. Massimo Grosi Filomena Pacella S. L. Yadava. 1. Introduction

The Skorokhod reflection problem for functions with discontinuities (contractive case)

Metric Spaces and Topology

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

LECTURE 15: COMPLETENESS AND CONVEXITY

INTRODUCTION TO REAL ANALYTIC GEOMETRY

V (v i + W i ) (v i + W i ) is path-connected and hence is connected.

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

CHAPTER 7. Connectedness

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

Estimates for probabilities of independent events and infinite series

Chapter 2 Convex Analysis

From error bounds to the complexity of first-order descent methods for convex functions

APPLICATIONS OF DIFFERENTIABILITY IN R n.

An asymptotic ratio characterization of input-to-state stability

An introduction to Birkhoff normal form

Second order forward-backward dynamical systems for monotone inclusion problems

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

arxiv: v2 [math.ap] 1 Jul 2011

Received: 15 December 2010 / Accepted: 30 July 2011 / Published online: 20 August 2011 Springer and Mathematical Optimization Society 2011

Stability of nonlinear locally damped partial differential equations: the continuous and discretized problems. Part I

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity.

Near-Potential Games: Geometry and Dynamics

Propagating terraces and the dynamics of front-like solutions of reaction-diffusion equations on R

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009)

Math 201 Topology I. Lecture notes of Prof. Hicham Gebran

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

Tools from Lebesgue integration

B. Appendix B. Topological vector spaces

Optimization Theory. A Concise Introduction. Jiongmin Yong

NOTES ON CALCULUS OF VARIATIONS. September 13, 2012

Existence and Uniqueness

Def. A topological space X is disconnected if it admits a non-trivial splitting: (We ll abbreviate disjoint union of two subsets A and B meaning A B =

arxiv: v2 [math.oc] 21 Nov 2017

Threshold behavior and non-quasiconvergent solutions with localized initial data for bistable reaction-diffusion equations

1 Relative degree and local normal forms

On the distributional divergence of vector fields vanishing at infinity

arxiv: v1 [math.oc] 22 Sep 2016

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

2 A Model, Harmonic Map, Problem

Introduction and Preliminaries

Numerical Approximation of Phase Field Models

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

Extreme points of compact convex sets

LECTURE 6. CONTINUOUS FUNCTIONS AND BASIC TOPOLOGICAL NOTIONS

Math 341: Convex Geometry. Xi Chen

2. Function spaces and approximation

MODULUS OF CONTINUITY OF THE DIRICHLET SOLUTIONS

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

ODE Final exam - Solutions

1 Directional Derivatives and Differentiability

Laplace s Equation. Chapter Mean Value Formulas

Analysis in weighted spaces : preliminary version

2 Sequences, Continuity, and Limits

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Least Squares Based Self-Tuning Control Systems: Supplementary Notes

Measurable Choice Functions

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

A user s guide to Lojasiewicz/KL inequalities

Appendix E : Note on regular curves in Euclidean spaces

Near-Potential Games: Geometry and Dynamics

Lecture 4 Lebesgue spaces and inequalities

Set, functions and Euclidean space. Seungjin Han

Tangent spaces, normals and extrema

Contents. O-minimal geometry. Tobias Kaiser. Universität Passau. 19. Juli O-minimal geometry

Transcription:

On damped second-order gradient systems Pascal Bégout, Jérôme Bolte and Mohamed Ali Jendoubi Abstract Using small deformations of the total energy, as introduced in [31], we establish that damped second order gradient systems u (t) + γu (t) + G(u(t)) = 0, may be viewed as quasi-gradient systems. In order to study the asymptotic behavior of these systems, we prove that any (nontrivial) desingularizing function appearing in KL inequality satisfies ϕ(s) c s whenever the original function is definable and C 2. Variants to this result are given. These facts are used in turn to prove that a desingularizing function of the potential G also desingularizes the total energy and its deformed versions. Our approach brings forward several results interesting for their own sake: we provide an asymptotic alternative for quasi-gradient systems, either a trajectory converges, or its norm tends to infinity. The convergence rates are also analyzed by an original method based on a one-dimensional worst-case gradient system. We conclude by establishing the convergence of solutions of damped second order systems in various cases including the definable case. The real-analytic case is recovered and some results concerning convex functions are also derived. Contents 1 Introduction 2 1.1 A global view on previous results....................................... 2 1.2 Main results................................................... 3 2 Structural results: lower bounds for desingularizing functions of C 2 functions 4 2.1 Lower bounds for desingularizing functions of potentials having a simple critical point structure.... 6 2.2 Lower bounds for desingularizing functions of definable C 2 functions................... 7 3 Damped second order gradient systems 10 3.1 Quasi-gradient structure and KL inequalities................................ 10 3.2 Convergence rate of quasi-gradient systems and worst-case dynamics................... 12 3.3 Damped second order systems are quasi-gradient systems......................... 14 4 Convergence results 16 5 Consequences 18 A Appendix: some elements on o-minimal structures 19 References 20 TSE (Institut de Mathématiques de Toulouse, Université Toulouse I Capitole), Manufacture des Tabacs, 21 allée de Brienne, 31015 Toulouse, Cedex 06, France TSE (GREMAQ, Université Toulouse I Capitole), Manufacture des Tabacs, 21 allée de Brienne, 31015 Toulouse, Cedex 06, France. Effort sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant number FA9550-14-1-0056. This research also benefited from the support of the FMJH Program Gaspard Monge in optimization and operations research Université de Carthage, Institut préparatoire aux études scientifiques et techniques, BP 51, 2080 La Marsa, Tunisia E-mail: Pascal.Begout@tse-fr.eu( ), Jerome.Bolte@tse-fr.eu( ), ma.jendoubi@fsb.rnu.tn( ) 2010 Mathematics Subject Classification: 35B40, 34D05, 37N40 Key Words: dissipative dynamical systems, gradient systems, inertial systems, Kurdyka- Lojasiewicz inequality, global convergence 1

1 Introduction 1.1 A global view on previous results In this paper, we develop some new tools for the asymptotic behavior as t goes to infinity of solutions u : R + R N of the following second order system u (t) + γu (t) + G(u(t)) = 0, t R +. (1.1) Here, γ > 0 is a positive real number which can be seen as a damping coefficient, N 1 is an integer and G C 2 (R N ) is a real-valued function. In Mechanics, (1.1) models, among other problems, the motion of an object subject to a force deriving from a potential G (e.g. gravity) and to a viscous friction force γu. In particular, the above may be seen as a qualitative model for the motion of a material point subject to gravity, constrained to evolve on the graph of G and subject to a damping force, further insights and results on this view may be found in [3, 14]. This type of dynamical system has been the subject of several works in various fields and along different perspectives, one can quote for instance [4] for Nonsmooth Mechanics, [12, 11] for recent advances in Optimization and [46] for pioneer works on the topic, partial differential equations and related aspects [34, 41, 5]. The aim of this work is to provide a deeper understanding of the asymptotic behavior of such a system and of the mechanisms behind the stabilization of trajectories at infinity (making each bounded orbit approach some specific critical point). Such behaviors have been widely investigated for gradient systems, u (t) + G(u(t)) = 0, for a long time now. The first decisive steps were made by Lojasiewicz for analytic functions through the introduction of the so-called gradient inequality [44, 43]. Many other works followed among which two important contributions: [13] for convex functions and [42] for definable functions. Surprisingly the asymptotic behavior of the companion dynamics (1.1) has only been recently analyzed. The motivation for studying (1.1) seems to come from three distinct fields PDEs, Mechanics and Optimization. Out of the convex realm [45, 1], the seminal paper is probably [31]. Like many of the works on gradient systems the main assumption, borrowed from Lojasiewicz original contributions, is the analyticity of the function or more precisely the fact that the function satisfies the Lojasiewicz inequality. This work paved the way for many developments: convergence rates studies [33], extension to partial differential equations [47, 39, 38, 32, 37, 34, 19, 27, 26, 35, 5], use of various kind of dampings [17, 18] (see also [16, 36, 29, 40]). Despite the huge amount of subsequent works, some deep questions remained somehow unanswered; in particular it is not clear to see: What are the exact connections between gradient systems and damped second-order gradient systems? Within these relationships, how central is the role of the properties/geometry of the potential function G? Before trying to provide some answers, we recall some fundamental notions related to these questions; they will also constitute the main ingredients in our analysis of (1.1). Quasi-gradient fields. The notion is natural and simple: a vector field V is called quasi-gradient for a function L if it has the same singular point (as L) and if the angle α between the field V and the gradient L remains acute and bounded away from π/2. Proper definitions are recalled in Section 3.1. Of course, such systems have a behavior which is very similar to those of gradient systems (see Theorem 3.2). We refer to [6] and the references therein for further geometrical insights on the topic. Liapunov functions for damped second order gradient systems. The most striking common point between (1.1) and gradient systems is that of a natural Liapunov function. In our case, it is given by the total energy, sum of the potential energy and the kinetic energy, E T (u, v) = G(u) + 1 2 v 2. The above is a Liapunov function in the phase space, more concretely d dt E T (u(t), u (t)) = d ( 1 dt 2 u (t) 2 + G(u(t))) = γ u (t) 2. 2

Contrary to what happens for classical gradient systems the vector field associated with (1.1) is not strictly Lyapunov for E T : it obviously degenerates on the subspace [v = 0] (or [u = 0]). The use of E T is however at the heart of most results attached to this dynamical system. KL functions. A KL function is a function whose values can be reparametrized in the neighborhood of each of its critical point so that the resulting functions become sharp (1). More formally, G is called KL on the slice of level lines [0 < G < r 0 ] def = { } u R N ; 0 < G(u) < r ( 0, if there exists ϕ C 0 [0, r 0 ) ) C 1 (0, r 0 ) concave such that ϕ(0) = 0, ϕ > 0 and (ϕ G)(u) 1, u [0 < G < r 0 ]. Proper definitions and local versions can be found in the next section. The above definition originates in [10] and is based on the fundamental work of Kurdyka [42], where it was introduced in the framework of o-minimal structure (2) as a generalization of the famous Lojasiewicz inequality. KL functions are central in the analysis of gradient systems, the readers are referred to [10] and the references therein. Desingularizing functions. The function appearing above, namely ϕ, is called a desingularizing function: the faster ϕ tends to infinity at 0, the flatter is G around critical points. As opposed to the Lojasiewicz gradient inequality, this behavior, in the o-minimal world, is not necessarily of a power-type. Highly degenerate functions can be met, like for instance G(u) = exp ( 1/p 2 (u) ) where p : R N R is any real polynomial function. This class of functions belongs to the log-exp structure, an o-minimal class that contains semi-algebraic sets and the graph of the exponential function [48]. Finally, observe that if it is obvious that ϕ might have an arbitrarily brutal behavior at 0, it is also pretty clear that the smoothness of G is related to a lower-control of the behavior of ϕ, for instance we must have ϕ (0) = which is not the case in general in the nonsmooth world (see e.g. [9]). 1.2 Main results Several auxiliary theorems were necessary to establish our main result, we believe they are interesting for their own sake. Here they are: An asymptotic alternative for quasi-gradient systems: either a trajectory converges or it escapes to infinity, A general convergence rate result for the solutions of the gradient systems that brings forward a worst-case gradient dynamical system in dimension one, Lower bounds for desingularizing functions of C 2 KL functions. We are now in position to describe the strategy we followed in that paper for the asymptotic study of the damped second order gradient system (1.1). Our method was naturally inspired by the Liapunov function provided in [31]. 1. First we show that E T can be slightly and semi-algebraically (respectively, definably) deformed into a smooth function ET def, so that the gradient of the new energy Edef T makes an uniformly acute angle with the vector field associated with (1.1) this property only holds on bounded sets of the phase space. The system (1.1) appears therefore as a quasi-gradient system for ET def. 2. In a second step we establish/verify that the solutions of the quasi-gradient systems converge whenever they originate from a KL function. We also provide rates of convergence and we explain how they may be naturally and systematically derived from a one-dimensional worst-case gradient dynamics. At this stage it is possible to proceed abstractly to the proof of the convergence of solutions to (1.1) in several cases. For instance the definable case: we simply have to use the fact that ET def is definable whenever G is, so it is a KL function and the conclusion follows. Although direct and fast, this approach has an important drawback from a conceptual viewpoint since it relies on a desingularizing function attached to an auxiliary function ET def whose meaning is unclear. Whatever perspectives we may adopt (Mechanics, Optimization, PDEs), an important question is indeed to understand what happens when G is KL and how the desingularizing function of G actually impacts the convergence of solutions to (1.1). 1 That is, the norms of its gradient remain bounded away from zero. 2 A far reaching concept that generalizes semi-algebraic or (globally) subanalytic classes of sets and functions. 3

3. We answer to this question in the following way. (a) We prove that desingularizing functions of C 2 definable functions have a lower bound. Roughly speaking, we prove that for nontrivial critical points the desingularizing function has the property ϕ(s) c ( s or ) equivalently (3) ϕ (s) c s. (b) We establish that if ϕ is definable and desingularizing for G at u then it is desingularizing for both E T and E def T at (u, 0). 4. We conclude by combining previous results to obtain in particular the convergence of solutions to (1.1) under definability assumptions. We also provide convergence rates that depend on the desingularizing function of G, i.e. on the geometry of the potential. We would like to point out and emphasize two facts that we think are of interest. First the property ϕ(s) c s (see Lemma 2.9 below) is a new result and despite its intuitive aspect the proof is nontrivial. We believe it has an interest in its own sake. More related to our work is the fact that (in the definable case and in many other relevant cases) our results show that the desingularizing function of G is conditioning the asymptotic behavior of solutions of the system. Within an Optimization perspective this means that the complexity, or at least the convergence rate, of the dynamical system is entirely embodied in G when G is smooth. From a mechanical viewpoint, stabilization at infinity is determined by the conditioning of G provided the latter is smooth enough; in other words the intuition that for large time behaviors, the potential has a predominant effect on the system is correct a fact which is of course related to the dissipation of the kinetic energy at a constant rate. Notation. The finite-dimensional space R N (N 1) is endowed with the canonical scalar product.,. whose norm is denoted by.. The product space R N R N is endowed with the natural product metric which we still denote by.,.. We also define for any u R N and r > 0, B(u, r) = {u R N ; u u < r}. When S is a subset of R N its interior is denoted by int S and its closure by S. If F : R N R is a differentiable function, its gradient is denoted by F. When F is a twice differentiable function, its Hessian is denoted by 2 F. The set of critical points of F is defined by { } crit F = u R N ; F (u) = 0. This paper is organized as follows. In Section 2, we provide a lower bound for desingularizing function of C 2 functions under various assumptions, like definability (Proposition 2.8 and Lemma 2.9). In Section 3, we recall the behavior of a first order system having a quasi-gradient structure for some KL function and we provide an asymptotic alternative (Theorem 3.2). In Theorem 3.7, the convergence rate of any solution to a first order system having a quasi-gradient structure is proved to be better than that of a one-dimensional worst-case gradient dynamics (various known results are recovered in a transparent way). Finally, we establish that any function which desingularizes G in (1.1) also desingularizes the total energy and various relevant deformation of the latter (Proposition 3.11). In Section 4, we study the asymptotic behavior of solutions to (1.1) (Theorem 4.1) while in Section 5, we describe several consequences of our main results. Appendix provides, for the comfort of the reader, some elementary facts on o-minimal structures. 2 Structural results: lower bounds for desingularizing functions of C 2 functions To keep the reading smooth and easy, we will not formally define here o-minimal structure. The definition is postponed in Appendix. Let us however recall, at this stage, that the simplest o-minimal structure (containing the graph of the real product) is given by the class of real semi-algebraic sets and functions. A semi-algebraic set is the finite union of sets of the form { } u R N ; p(u) = 0, p i (u) < 0, i I, (2.1) 3 Recall that ϕ is definable. 4

where I is a finite set and p, {p i } i I are real polynomial functions. Let us recall a fundamental concept for dissipative dynamical systems of gradient type. Definition 2.1 (Kurdyka- Lojasiewicz property and desingularizing function). Let G : R N R be a differentiable function. (i) We shall say that G has the KL property at u R N if there exist r 0 > 0, η > 0 and ϕ C([0, r 0 ); R + ) such that 1. ϕ(0) = 0, ϕ C 1 ((0, r 0 ); R + ) concave and ϕ positive on (0, r 0 ), 2. u B(u, η) = G(u) G(u) < r 0 ; and for each u B ( u, η ), such that G(u) G(u), (ϕ G(. ) G(u) )(u) 1. (2.2) Such a function ϕ is called a desingularizing function of G at u on B(u, η). (ii) The function G is called a KL function if it has the KL property at each of its points. The following result is due to Lojasiewicz in its real-analytic version (see e.g. [43, 44]), it was generalized to o-minimal structures and considerably simplified by Kurdyka in [42] (see Appendix). Theorem 2.2 (Kurdyka- Lojasiewicz inequality [42] (4) ). Let O be an o-minimal structure and let G C 1 (R N ; R) be a definable function. Then G is a KL function. Remark 2.3. (a) Theorem 2.2 is of course trivial when u crit G take indeed, ϕ(s) = cs where c = 1+ε G(u) and ε > 0. (b) Restrictions of real-analytic functions to compact sets included in their (open) domain belong to the o-minimal structure of globally analytic sets [25]. They are therefore KL functions (see indeed Example A.2). In some o- minimal structures there are nontrivial functions for which all derivatives vanish on some nonempty set, like G(u) = exp( 1/f 2 (u)) where f 0 is any smooth semi-algebraic function achieving the value 0 (5) (see also Example A.2). For these cases, ϕ is not of power-type as it is the case when G is semi-algebraic or real-analytic. Other types of functions satisfying the KL property in various contexts are provided in [2] (see also Corollary 5.5). (c) Desingularizing functions of definable functions can be chosen to be definable, strictly concave and C k (where k is arbitrary). The following trivial notion is quite convenient. Definition 2.4 (Trivial critical points). A critical point u of a differentiable function G : R N R is called n trivial if u int crit G. It is nontrivial otherwise. Observe that u is nontrivial if, and only if, there exists u n u such that G(u n ) G(u), for any n N. When u is a trivial critical point of G, any concave function ϕ C 0( [0, r 0 ) ) C 1 (0, r 0 ) such that ϕ > 0 and ϕ(0) = 0 is desingularizing at u. An immediate consequence of the KL inequality is a local and strong version of Sard s theorem. Remark 2.5 (Local finiteness of critical values). Let G C 1 (R N ; R) and u R N. Assume that G satisfies the KL property at u on B(u, η). Then u B(u, η) and G(u) = 0 = G(u) = G(u). The simplest functions we can think of with respect to the behavior of the solutions to (1.1) are given by functions with linear gradients, that is quadratic forms G(u) = 1 2 Au, u, u RN, where A M N (R), A T = A. When A 0, it is easy to establish directly that ϕ(s) = 1 λ s (where λ is a nonzero eigenvalue with smallest absolute value) provides a desingularizing function. In the subsections to come, we show that the best we can hope in general for a desingularizing function ϕ attached to a C 2 function G is precisely a quantitative behavior of square-root type. 4 See comments in Appendix. 5 This function is definable in the log-exp structure of Wilkie [48]. 5

2.1 Lower bounds for desingularizing functions of potentials having a simple critical point structure Our first assumption, formally stated below, asserts that points having critical value must be critical points. The assumption is rather strong in general but it will be complemented in the next section by a far more general result for definable functions. Let u crit G. There exists η > 0 such that for any u B(u, η), (2.3) ( ) G(u) = G(u) = u crit G. Example 2.6. (a) When N = 1 and G C 1 is KL then assumption (2.3) holds. n [If the result does not hold then there exists a sequence (x n) n N such that x n u and G(x n) = G(u), (2.4) G (x n) 0, (2.5) for any n N. Without loss of generality, we may assume that (x n) n N is monotone, say decreasing. From (2.4) (2.5) and Rolle s Theorem, there exists a sequence (u n) n N such that x n+1 < u n < x n, G (u n) = 0, G(u n) G(u), for any n N. Thus G(u n) are critical values distinct from G(u) such that G(u n) G(u); this contradicts the local finiteness of critical values see Remark 2.5.] (b) Of course, the result in (a) cannot be extended to higher dimensions. Consider for instance G : R 2 R, G(u 1, u 2 ) = u 2 1 u 2 2, which is obviously KL. One has G(u) = 0 if, and only if, u = 0, yet G(t, t) = 0 for any t in R. (c) If G is convex, (2.3) holds globally, i.e., with η =. [This follows directly from the well-known fact that G(u) = min G if, and only if, G(u) = 0.] Lemma 2.7 (Comparing values growth with gradients growth). Let G C 1,1 loc (RN ; R) and u crit G. Assume there exists ε > 0 such that u B(u, 2ε) and G(u) = G(u) = u crit G, in other words (2.3) holds (with η = 2ε). Then there exists c > 0 such that for any u B(u, ε). G(u) G(u) c G(u) 2, (2.6) Proof. Working if necessary with G(u) = G(u) G(u), we may assume, without loss of generality, that G(u) = 0. Let us proceed in two steps. Step 1. Let H C 1,1( B(u, 2ε); R ) with u crit H and assume further that H 0. We claim that there exists c > 0 such that Denote by L 2 the Lipschitz constant of H on B(u, 2ε), let L 1 = u B(u, ε), H(u) c H(u) 2. (2.7) ( L1 = 0 or L 2 = 0 ) = H B(u,ε) 0 = (2.7), max H(u) and set L = L 1 + L 2. Since, u B(u,2ε) we may assume that L 2 > 0 and L 1 > 0. Let u B(u, ε). We have for any v B(0, 2ε), = H(v) H(u) = 1 0 1 0 H ( (1 t)u + tv ), v u dt H ( (1 t)u + tv ) H(u), v u dt + H(u), v u, 6

so that for any v B(0, 2ε), H(v) H(u) H(u), v u L 2 2 v u 2. (2.8) Note that ( u ε L H(u)) u u u + ε L1 L H(u) < ε+ε L < 2ε. By convexity, we infer that [ u, u ε L H(u)] B(u, 2ε). It follows that v = u ε L H(u) is an admissible choice in (2.8). Without loss of generality, we may assume that ε 1. This leads to 0 H(v) H(u) ε 2L H(u) 2. Whence the claim. Step 2. Define for any u B(u, 2ε), H(u) = G(u). Since ( G(u) = 0 = G(u) = 0 ), we easily deduce that H C 1 b( B(u, 2ε); R ) and for any u B(u, 2ε), H(u) = sign ( G(u) ) G(u). Denote by L2 the Lipschitz constant of G on B(u, 2ε). We claim that, H(u) H(v) L 2 u v, (2.9) for any (u, v) B(u, 2ε) B(u, 2ε). Let (u, v) B(u, 2ε) B(u, 2ε). Estimate (2.9) being clear if G(u)G(v) 0, we may assume that G(u)G(v) < 0. By the Mean Value Theorem and the assumptions on G, it follows that there exists t (0, 1) such that for w = (1 t)u + tv, G(w) = 0 and G(w) = 0. We then infer, H(u) H(v) = G(u) + G(v) G(u) + G(v) = G(u) G(w) + G(w) G(v) L 2 u w + L 2 w v = L 2 u v. Hence (2.9). It follows that H C 1,1( B(u, 2ε); R ) and H satisfies the assumptions of Step 1. Applying (2.7) to H, we get (2.6). This concludes the proof. Proposition 2.8 (Lower bound for desingularizing functions). Let G C 1,1 loc (RN ; R) and let u be a nontrivial critical point, i.e. u crit G \ int crit G. Assume that G satisfies the KL property at u and that assumption (2.3) holds at u. Then there exists β > 0 such that for any desingularizing function ϕ of G at u, for any small positive s. ϕ (s) β s, (2.10) Proof. We may assume G(u) = 0. Combining (2.2) and (2.6), we deduce that ϕ ( G(u) ) 1 G(u) β, for G(u) any u B(u, ε) such that G(u) G(u) (Remark 2.5). Changing G into G if necessary, there is no loss of generality to assume that there exists u n such that u n u with G(u n ) > 0 (recall u is a nontrivial critical point). Since G is continuous, this implies by a connectedness argument that for some ρ there exists r > 0 such that G ( B(u, ρ) ) (0, r). Using the parametrization s (0, r) we conclude that ϕ (s) β s, for any s sufficiently small. 2.2 Lower bounds for desingularizing functions of definable C 2 functions This part makes a strong use of definability arguments (these are recalled in the last section). Lemma 2.9 (Lower bounds for desingularizing functions of C 2 definable functions). Let G : Ω R be a C 2 definable function on an open subset Ω 0 of R N. We assume that 0 is a nontrivial critical point (6) and that G(0) = 0. 6 Equivalently, we assume that there exists u n n 0 such that G(u n) 0. 7

Since G is definable it has the KL property (7) that is, there exist η, r 0 > 0 and ϕ : [0, r 0 ) R as in Definition 2.1 such that ( ϕ G ) (u) 1, (2.11) for any u in B(0, η) such that G(u) 0. Then there exists c > 0 such that ϕ (s) c s, (2.12) so that ϕ(s) 2c s, for any small s > 0. Proof. Let us outline the ideas of the proof: after a simple reduction step, we show that the squared norm of a/the smallest gradient on a level line increases at most linearly with the function values. In the second step, we show that this estimate is naturally linked to the increasing rate of ϕ itself and to property (2.12). Let ϕ : [0, r 0 ) R be any desingularizing function of G at 0 on B(0, η), as in Definition 2.1. Changing G in G if necessary, we may assume by Definition 2.4, without loss of generality, that there exists a n sequence (u n ) n such that u n 0 and G(u n ) > 0, for any n N. Let us proceed with the proof in three steps. Step 1. We first modify the function G as follows. Let ρ C 2 (R N ; [0, 1]) be a semi-algebraic function such that { supp ρ B(0, η) Ω, ρ(x) = 1, if x B ( 0, η 2 ). Let us define Ĝ on RN by ρ(u)g(u) + dist ( u, B ( 0, η )) 3 2, if u Ω, Ĝ(u) = 0, if u R N \ Ω. It follows that Ĝ C2 (R N ; R), leaves the set of desingularizing functions at 0 unchanged, has compact lower level sets and is definable in the same structure (recall Definition A.1 (iii)). Finally, we obviously have, u n n 0 with Ĝ(u n) > 0, n N. (2.13) Without loss of generality, we may assume that η 1 and r 0 η3 8. Let u RN \ B(0, η). One has, It follows that, Step 2. For r > 0, we introduce ( ( Ĝ(u) = dist u, B 0, η )) 3 ( = u η ) 3 η 3 2 2 8 r 0. inf Ĝ(u) = min Ĝ(u), r (0, r 0). (2.14) u B(0,η) [Ĝ=r] u [Ĝ=r] (P r ) ψ(r) = min { } 1 2 Ĝ(u) 2 ; u R N, Ĝ(u) = r. Since the set of critical values of a definable function is finite and since the level sets are compact, we may choose, if necessary, r 0 so that ψ > 0 on (0, r 0 ) (the fact that 0 is a nontrivial critical point excludes the case when ψ vanishes around 0). If we denote by S(r) the nonempty compact set of solutions to (P r ), one easily sees that 7 See Theorem 2.2. S : (0, r 0 ) R N, 8

is a definable point-to-set mapping this follows by a straightforward use of quantifier elimination (i.e., by the use of Definition A.1). Using the Definable Selection Lemma (Lemma A.4), one obtains a definable curve u : (0, r 0 ) R N such that u(r) S(r), for any r (0, r 0 ). Finally, using the Monotonicity Lemma (Lemma A.3) repeatedly on the coordinates u i of u, one can shrink r 0 so that u is actually in C 1 ((0, r 0 ); R N ). Fix now r in (0, r 0 ). Since r is noncritical the problem (P r ) is qualified and we can apply Lagrange s Theorem for constrained problems. This yields the existence of a real multiplier λ(r) such that with of course Ĝ(u(r)) = r. 2 Ĝ(u(r)) Ĝ(u(r)) λ(r) Ĝ(u(r)) = 0, (2.15) Note that for any r (0, r 0 ), Ĝ(u(r)) 0 (as seen at the beginning of this step) so that λ(r) is an actual eigenvalue of 2 Ĝ(u(r)). Since Ĝ is C2, the curve 2 Ĝ(u(r)) is bounded in the space of matrices M N (R). Since eigenvalues depend continuously on operators, one deduces from the previous remarks that there exists λ 0 such that λ(r) λ, r (0, r 0 ). Multiplying (2.15) by u (r) gives 2 Ĝ(u(r)) Ĝ(u(r)), u (r) = λ(r) Ĝ(u(r)), u (r), which is nothing else than 1 d 2 dr Ĝ(u(r)) 2 = λ(r) d dr Ĝ(u(r)). Since Ĝ(u(r)) = r, one has 1 d 2 dr Ĝ(u(r)) 2 = λ(r), so after integration on [s, r] (0, r 0 ), one obtains r Ĝ(u(r)) 2 Ĝ(u(s)) 2 r,s 0 = 2 λ(τ)dτ 2λ r s 0. (2.16) s ( Ĝ(u(r)) 2 ) It follows that is a Cauchy s family, so that the limit l of Ĝ(u(s)) 2 as s goes to zero exists in s>0 n [0, ). We recall that by assumption (2.13), u n 0, Ĝ(u n) > 0 and Ĝ(u n) n 0. Now, setting r n = Ĝ(u n), one has by definition of u(r n ), Ĝ(u n) Ĝ(u(r n)). This implies that l = 0 and as a consequence (2.16) yields in other words 1 2 Ĝ(u(r)) 2 = Step 3. Let us now conclude. By KL inequality one has for any r (0, r 0 ), ϕ (r) r 0 λ(τ)dτ λr, (2.17) ψ(r) λr, r (0, r 0 ). (2.18) 1 u B(0, η) [G = r]. (2.19) Ĝ(u), As a consequence, we can use (2.14) in (2.19) and the linear estimate (2.18) above to conclude as follows: ϕ (r) = { Ĝ(u) ; 1 } inf u B(0, η) [Ĝ = r] { Ĝ(u) ; 1 } min u [Ĝ = r] 1 2ψ(r) c r, 9

for any r (0, r 0 ), with c = ( 2λ ) 1. Hence (2.12). Remark 2.10. (a) Note that if G C 2 then (2.12) does not hold. Indeed, take G(u) = u 3 2 and ϕ(s) = s 2 3 as a (semi-algebraic) counter-example. (b) When we omit the assumption that 0 is a nontrivial critical point, i.e. 0 int crit G, then G vanishes in a neighborhood of 0. In that case, the result is not true in general since any concave increasing function adequately regular is desingularizing for G. However a function ϕ(s) = c s can still be chosen as a desingularizing function. Hence, for an arbitrary C 2 definable function, we can always assume that for any critical point, the corresponding desingularizing function satisfies ϕ (s) c 1 s (locally for some positive constant c). 3 Damped second order gradient systems 3.1 Quasi-gradient structure and KL inequalities Definition 3.1. Let Γ be a nonempty closed subset of R N and let F : R N R N be a locally Lipschitz continuous mapping. (i) We say that the first order system u (t) + F ( u(t) ) = 0, t R +, (3.1) has a quasi-gradient structure for E on Γ, if there exist a differentiable function E : R N R and α Γ = α > 0 such that (angle condition) E(u), F (u) α E(u) F (u), for any u Γ, (3.2) (rest-points equivalence) crit E Γ = F 1 ({0}) Γ. (3.3) (ii) Equivalently a vector field F having the above properties is said to be quasi-gradient for E on Γ. The following result involves classical material and ideas, yet, the fact that an asymptotic alternative can be derived in this setting does not seem to be well-known (see however [2] in a discrete context). Theorem 3.2 (Asymptotic alternative for quasi-gradient fields). Let F : R N R N be a locally Lipschitz mapping that defines a quasi-gradient vector field for E on R N, for some differentiable function E : R N R. Assume further that the function E is KL. Let u be any solution to (3.1). Then, (i ) either u(t) t, (ii ) or u converges to a singular point u of F as t. When (ii ) holds then u L 1( (0, ); R N) and u (t) t 0. Moreover, we have the following estimate, u(t) u 1 α ϕ( E(u(t)) E(u ) ), (3.4) where ϕ is a desingularizing function of E at u and α is the constant in (3.2). Proof. We assume that (i) does not hold, so there exist u R N and a sequence s n such that u(s n ) n u. Note that by continuity of E, one has E ( u(s n ) ) n E(u ). Observe also that from equation (3.1) and the angle condition (3.2), one has for any t 0, d ( ) ( ) E u (t) = E u(t), u (t) dt = E ( u(t) ), F ( u(t) ) α E(u(t)) F (u(t)), (3.5) 10

and thus the mapping t E(u(t)) is nonincreasing, which implies lim E(u(t)) = E(u ). t Note that if E(u(t)) = E(u ) for some t, one would have d dt( E u ) (t) = 0 for any t > t, which would in turn imply, by (3.5), that E(u(t)) F (u(t)) = 0, for any such t. In view of the rest point equivalence (3.3), this would mean that F (u(t)) = 0, hence by uniqueness of solution curves, that u(t) = u for any t 0. We can thus assume without loss of generality that E(u(t)) > E(u ), t 0. (3.6) Let t 0 > 0 be such that u(t 0 ) B ( u, η ) ( ( 2 and ϕ E u(t0 ) ) E(u ) ) ( 0, ηα ) 2, where α > 0 is the constant in (3.2) [in view of our preliminary comments and of the continuity of E such a t 0 exists]. By continuity of u, there exists τ > 0 such that for any t [t 0, t 0 + τ), u(t) B(u, η). So we may define T (t 0, ] as { } T = sup t > t 0 ; s [t 0, t), u(s) B(u, η). By (3.5), the Kurdyka- Lojasiewicz inequality (2.2) and equation (3.1), we have for any t (t 0, T ), It follows from the above estimate that d ( ( ϕ E ( u(. ) ) E ( ) )) u (t) dt = ϕ ( E ( u(t) ) E(u ) ) d ( ) E u (t) dt α ϕ ( E ( u(t) ) E(u ) ) ( ) ( ) E u(t) F u(t) = α F ( u(t) ) ( ϕ ( E(. ) E(u ) )( u(t) ) α u (t). (3.7) u(t) u(t 0 ) t t 0 u ϕ ( E ( u(t 0 ) ) E(u ) ) (s) ds < η α 2, (3.8) for any t (t 0, T ). We claim that T =. Indeed, otherwise T < and (3.8) applies with t = T. Hence, u(t ) u u(t ) u(t 0 ) + u(t 0 ) u < η. Then u(t ) B(u, η), which contradicts the definition of T. As a consequence the curve u belongs to L 1( (t 0, ); R N) by (3.8) and the curve u converges to u by Cauchy s criterion. Finally since 0 must be a cluster point of u ( recall indeed u (t) dt < and u is uniformly continuous by (3.1) ), one must have F (u 0 ) = 0. The announced estimate follows readily from (3.8) and the fact that T =. Corollary 3.3. Let F : R N R N be locally Lipschitz continuous and assume that for any R > 0 the mapping F defines a quasi-gradient vector field for some differentiable function E R : R N R on B(0, R). Assume further that each of the functions E R is KL. Let u be any bounded solution to (3.1). Then u converges to a singular point u of F, u is integrable and converges to 0. In particular, if we take R sup { u(t) ; t [0, ) }, we have the following estimate, u(t) u 1 ( ) ϕ E R (u(t)) E R (u ), (3.9) α R where ϕ is a desingularizing function of E R at u and α R is the constant in (3.2), for the ball B(0, R). Proof. Take R sup { u(t) ; t [0, ) } and observe that the previous proof may be reproduced as it is: just replace E by E R. 11

3.2 Convergence rate of quasi-gradient systems and worst-case dynamics To simplify our presentation we consider first a proper gradient system: u (t) + E(u(t)) = 0, (3.10) where E : R N R is a twice continuously differentiable KL function. We assume that u is bounded so, by virtue of our previous considerations, the curve converges to some critical point u of E. Observe that if u is a trivial critical point, one actually has u(0) = u and the asymptotic study is trivial. We thus assume u to be nontrivial, and we denote by ϕ a desingularizing function of E at u. We set ψ = ϕ 1, whose domain is denoted by [0, a), (with a (0, ]) and we consider the one-dimensional worst-case gradient dynamics (see [8]): ν (t) + ψ (ν(t)) = 0, ν(0) = ν 0 (0, a). (3.11) We shall assume that ϕ (s) c s, on (0, r 0 ), (3.12) which implies that solutions ν to (3.11) are globally defined on [0, ) and satisfy lim t ν(t) = 0 with ν(t) ν 0 e c0t, for any t 0 (and for some c 0 > 0). Uniqueness holds by concavity of ϕ. Finally, note that if E is a C 2 definable function then ϕ can be chosen to be C 2, strictly concave and satisfying (3.12) (Remark 2.3 (c) and Lemma 2.9). Radial functions and worst-case dynamics. A full justification of the terminology worst-case dynamics is to be given further, but at this stage one can observe that E could be taken of the form E rad (u) = ϕ 1 ( u u ), with u B(u, η) (η > 0), provided that ϕ 1 is smooth enough. In that case ϕ is clearly desingularizing and the solutions of the gradient system (3.10) are radial in the sense that they are of the form (8) u(t) = u + ν(t) u 0 u u 0 u, (3.13) where ν is a solution to (3.11). In this case, the dynamics (3.11) exactly measures the convergence rates for (3.10), since one has for any t 0 and any u 0 such that ν(0) = u 0 u, E rad (u(t)) = ψ(ν(t)), (3.14) u(t) u = ν(t). (3.15) We are about to see that this behavior in terms of convergence rate is actually the worst we can expect. Remark 3.4. (a) As can be seen below, the worst-case gradient system is introduced to measure the rate of convergence of solutions for large t. Since nontrivial solutions to (3.11) have the same asymptotic behavior (they are, indeed, all of the form ν 1 (t) = ν(t + t 0 ) where t 0 is some real number), the choice of the initial condition ν(0) in (0, a) can be made arbitrarily. (b) The above rewrites ν (t)ϕ ( ϕ 1 (ν(t)) ) = 1. Thus if µ denotes an antiderivative of ϕ ϕ 1, one has ν(t) = µ 1 ( t + a 0 ) (where a 0 is a constant), for any t > 0 large enough. (c) In general, the explicit integration of such a system depends on the integrability properties of ψ and on the fact that ϕ ϕ 1 admits an antiderivative in a closed form. For instance if ϕ(s) = ( s c )θ, with c > 0 and θ ( ) 0, 1 1 2, then ψ(s) = cs θ and 8 Just use the formula in (3.10). ν (t) + c θ ν(t) 1 θ θ = 0, ν(0) (0, a). 12

Thus by integration with c 1 > 0. As a consequence, d 1 θ d dt ν1 θ (t) = dt ν 1+2θ θ (t) = c 1, ν(t) = ( c 2 + c 1 t ) θ 1 2θ, with c 2 > 0. When θ = 1 2 one easily sees that ν(t) = ν(0) exp ( 2ct). Theorem 3.5 (The worst-case rate and worst-case one-dimensional gradient dynamics). Let E C 2 (R N ; R) be a KL function, let u be a bounded solution to (3.10) and let u crit E satisfying u(t) t u (such a u exists by Theorem 3.2). Then for any t large enough, and where ν is a solution to (3.11). E(u(t)) E(u ) ψ(ν(t)), (3.16) u(t) u ν(t), (3.17) Proof. Without loss of generality, we may assume that E(u ) = 0. From the previous results, we know that for any t t 0, we have u(t) B(u, η) and E(u(t)) (0, r 0 ), so that the KL inequality gives (see Theorem 3.2 and (3.7)): d ( ) ϕ E(u) (t) u (t). dt Set z(t) = E(u(t)). Since d dt (E u)(t) = u (t) 2, one has d dt (ϕ z)(t) z (t), or equivalently ϕ ( z(t) ) 2 z (t) 1. Consider now the worst-case gradient system with initial condition ν(t 0 ) = ϕ ( E(u(t 0 )) ) and set z a (t) = ψ(ν(t)) = ϕ 1 (ν(t)), for t t 0. The system (3.11) becomes ϕ (z a (t))z a(t) + 1 ϕ (z a(t)) = 0, i.e., ϕ (z a (t)) 2 z a(t) = 1. If µ is an antiderivative of ϕ 2 on (0, r 0 ), it is an increasing function and one has d dt (µ z)(t) = ϕ ( z(t) ) 2 z (t) 1 = ϕ ( z a (t) ) 2 z a (t) = d dt (µ z a)(t), and µ(z(t 0 )) = µ(z a (t 0 )). As a consequence, µ(z(t)) µ(z a (t)), hence z(t) z a (t) for any t t 0, which is exactly (3.16). Using (3.4), we conclude by observing that The theorem is proved. u(t) u ϕ(e(u(t))) ϕ(z a (t)) = ν(t). Remark 3.6. Observe that in the case of a desingularizing function of power type (see Remark 3.4 (c)), we recover well-known estimates [33]. Theorem 3.7 (The worst-case one-dimensional gradient dynamics for quasi-gradient systems). Let F : R N R N be a locally Lipschitz continuous mapping that defines a quasi-gradient vector field for some function E C 2 (R N ; R) on B(0, R), for any R > 0. Assume further that the function E is KL and that for any R > 0, there exists a positive constant b > 0 such that E(u) b F (u), (3.18) for any u B(0, R). Assume further that for a given initial data u 0 R N the solution u to (3.1) converges to some rest point u. Denote by ϕ some desingularizing function for E at u. Then there exist some constants c, d > 0, t 0 R such that where ν is a solution to (3.11). u(t) u dν (ct + t 0 ), (3.19) 13

Proof. Combining the techniques used in Theorems 3.2 and 3.5, the proof is almost identical to that of Theorem 3.5. Without loss of generality, we may assume that E(u ) = 0. We simply need to check the following inequality which is itself a consequence of the assumption (3.18) applied with R = sup u(t). t>0 d dt (E u)(t) = u (t), E(u(t)) F (u(t)) E(u(t)) b F (u(t)) 2 b u (t) 2. From (3.7) one has d dt (ϕ E)(u(t)) α u (t), for any t sufficiently large. Setting z(t) = E(u(t)), one obtains d dt (ϕ z)(t) α b z (t). The conclusion follows as before by using a reparametrization of (3.11). Remark 3.8. Assumption (3.18) is of course necessary and simply means that the vector field F drives solutions to their rest points at least as fast as E (see also [20]). 3.3 Damped second order systems are quasi-gradient systems As announced earlier our approach to the asymptotic behavior of damped second order gradient system is based on the observation that (1.1) can be written as a system having a quasi-gradient structure. For G C 2 (R N ; R), let us define F : R N R N by Then (1.1) is equivalent to F(u, v) = ( v, γv + G(u) ). U (t) + F ( U(t) ) = 0, t R +, with U = (u, v). (3.20) As explained in the introduction the total energy function E T (u, v) = G(u) + 1 2 v 2 (sum of the potential energy and the kinetic energy) is a Liapunov function for our dynamical system (1.1). Formally E T (u, v), F(u, v) = γ v 2. From the above we see, that the damped system (1.1) is not quasi-gradient for E T since one obviously has a degeneracy phenomenon ET (u, v), F(u, v) = 0 whenever v = 0, (3.21) where in general E T (u, v) 0 and F(u, v) 0. The idea that follows consists in continuously deforming the level sets of E T, through a family of functions: E λ : R N R N R with E 0 = E T (λ denotes here a positive parameter), so that the angle formed between each of the gradients of the resulting functions E λ, λ > 0 and the vector F remains far away from π/2. In other words we seek for functions making F a quasi-gradient vector field. Proposition 3.9 (The second order gradient systems are quasi-gradient systems). Let G C 2 (R N ; R) and let γ > 0. For λ > 0, define E λ C 1 (R N R N ; R) by ( ) 1 E λ (u, v) = 2 v 2 + G(u) + λ G(u), v. For any R > 0, there exists λ 0 > 0 satisfying the following property. For any λ (0, λ 0 ], there exists α > 0 such that Eλ (u, v), F(u, v) α E λ (u, v) F(u, v), (3.22) for any (u, v) B(0, R) R N. Furthermore, for any λ [0, λ 0 ]. crit E λ ( B(0, R) R N ) = F 1 ({0}) ( B(0, R) R N ), (3.23) 14

Proof. For each (u, v) R N R N, we have E λ (u, v) = ( G(u) + λ 2 G(u)v, v + λ G(u) ). Let R > 0 be given and let M = max { 2 G(u) ; u B(0, R) }. Choose λ 0 > 0 small enough to have γ ) (M + γ2 λ 0 > 0. 2 Let λ (0, λ 0 ]. Then for any (u, v) B(0, R) R N, we obtain by Young s inequality, Eλ (u, v), F(u, v) = γ v 2 λ 2 G(u)v, v + λ G(u), γv + λ G(u) 2 ( γ Mλ 0 λ ) 0 2 γ2 v 2 + λ 2 G(u) 2 { where α 0 = min γ ) } (M + γ2 2 λ 0, λ 2 > 0. Moreover, α 0 ( v 2 + G(u) 2 ), (3.24) E λ (u, v) F(u, v) 1 2 E λ(u, v) 2 + 1 2 F(u, v) 2 C( v 2 + G(u) 2 ). (3.25) Combining (3.25) with (3.24), we deduce that the angle condition (3.22) is satisfied with α = α0 C. Finally, the rest point equivalence (3.23) follows from (3.24). Remark 3.10. Note that for λ = 0, we recover the total energy E T (u, v) = E 0 (u, v) = 1 2 v 2 + G(u). The following result is of primary importance: roughly speaking it shows that functions which desingularize the potential G at some critical point u, also desingularize the energy function E T and more generally the family of deformed functions E λ at the corresponding critical point (u, 0). This result implies in turn that the decay rate of the energy is essentially conditioned by the geometry of G as one might expect from a mechanical or an intuitive perspective. In the proposition below one needs the kinetic energy to be desingularized by ϕ. This explains our main assumption. Proposition 3.11 (Desingularizing functions of the energy). Let G C 2 (R N ; R), u crit G and assume that there exists a desingularizing function ϕ C 1( ) (0, r 0 ); R + of G at u on B(u, η) such that ϕ (s) c s, for any s (0, r 0 ). Then there exist λ 1 > 0, η 1 > 0 and c > 0 such that (ϕ 1 ) 2 E λ(.,. ) E λ (u, 0) (u, v) c, (3.26) for any λ [0, λ 1 ] and any (u, v) B(u, η 1 ) B(0, η 1 ) such that E λ (u, v) E λ (u, 0). Proof. By standard translation arguments, we may assume without loss of generality that G(u) = 0 and u = 0. Then E λ (0, 0) = 0 and (3.26) consists in showing that for some constant c > 0, ( ) 1 ϕ 2 E c λ(u, v) E λ (u, v), for any { λ [0, λ 1 ] and any }(u, v) B(0, η 1 ) B(0, { η 1 ) such that } E λ (u, v) 0. Recall that 0 crit G. Let M = max 2 1 G(u) ; u B(0, η) and define λ 1 = min 4, 1 2(M 2 +1). We have, E λ (u, v) 2 = G(u) + λ 2 G(u)v 2 + v + λ G(u) 2 G(u) 2 + v 2 λ 1 (M 2 + 1) v 2 2λ 1 G(u) 2 1 2( v 2 + G(u) 2), (3.27) 15

and in particular, G(u) 2 E λ (u, v), (3.28) for any λ [0, λ 1 ] and any (u, v) R N R N. Let now (λ, u, v) [0, λ 1 ] B(0, η) R N be such that E λ (u, v) 0. Since ϕ is nonincreasing, we have ( ) ( 1 1 ϕ 2 E λ(u, v) ϕ 2 E λ(u, v) E λ (u, 0) + 1 ) 2 E λ(u, 0) ϕ ( max { E λ (u, v) E λ (u, 0), E λ (u, 0) }). (3.29) Let us first find a lower bound on ϕ ( E λ (u, 0) ). Observe that necessarily E λ (u, 0) = G(u) 0. In particular, G(u) 0 (Remark 2.5). We then have by (2.2) and (3.28), E λ (u, v) 0 and ϕ ( E λ (u, 0) ) = ϕ ( G(u) ) for any λ [0, λ 1 ] and any (u, v) B(0, η) R N such that E λ (u, 0) 0. 1 G(u) 1 2 E λ (u, v), (3.30) Let us now estimate ϕ ( E λ (u, v) E λ (u, 0) ) in (3.29) under the assumption E λ (u, v) E λ (u, 0). Cauchy-Schwarz inequality implies that for any λ [0, λ 1 ], E λ (u, v) E λ (u, 0) 1 2( v 2 + λ 1 v 2 + λ 1 G(u) 2). (3.31) Combining (3.31) with (3.27), we deduce that for any λ [0, λ 1 ] and any (u, v) R N R N, E λ (u, v) E λ (u, 0) (1 + λ 1 ) E λ (u, v) 2. (3.32) By continuity of G, there exists η 1 (0, η) such that { } sup (1 + λ 1 ) E λ (u, v) 2 ; (λ, u, v) [0, λ 1 ] B(0, η 1 ) B(0, η 1 ) < r 0. Using successively the fact that ϕ is nonincreasing and ϕ (s) B(0, η 1 ) with E λ (u, v) E λ (u, 0) then E λ (u, v) 0 and ϕ ( E λ (u, v) E λ (u, 0) ) ϕ ( (1 + λ 1 ) E λ (u, v) 2) c s, it follows from (3.32) that if (u, v) B(0, η 1 ) c 1 E λ (u, v), (3.33) where c 1 > 0 is a constant. Finally, inequalities (3.30) and (3.33) together with (3.29) yield the existence of a constant c > 0 such that for any λ [0, λ 1 ] and any (u, v) B(0, η 1 ) B(0, η 1 ) such that E λ (u, v) 0, there holds E λ (u, v) 0 and ϕ ( 1 2 E λ(u, v) ) E λ (u, v) c, which is the desired result. 4 Convergence results Before providing our last results, we would like to recall to the reader that a bounded trajectory of (1.1) may not converge to a single critical point; finite-dimensional counterexamples for N = 2 are provided in [4, 41], in each case the trajectory of (1.1) ends up circling indefinitely around a disk. We now proceed to establish a central result whose specialization to various settings will provide us with several extensions of Haraux-Jendoubi s initial work [31]. Theorem 4.1. Let G C 2 (R N ; R) and (u 0, u 0) R N R N be a set of initial conditions for (1.1). Denote by u C 2( [0, ); R N ) the unique regular solution to (1.1) with initial data (u 0, u 0). Assume that the following holds. 1. (The trajectory is bounded) sup u(t) <. t>0 16

Then, 2. (Convergence to a critical point) G is a KL function. Each desingularizing function ϕ of G satisfies for any s (0, η 0 ), where β and η 0 are positive constants (see Definition 2.1). ϕ (s) β s, (4.1) (i) u and u belong to L 1( (0, ); R N) and in particular u converges to a single limit u in crit G. (ii) When u converges to u, we denote by ϕ the desingularizing function of G at u. One has the following estimate u(t) u cν(t), where ν is the solution of the worst-case gradient system ν (t) + (ϕ 1 ) (ν(t)) = 0, ν(0) > 0. Proof of Theorem 4.1. Let G C 2 (R N ; R), let (u 0, u 0) R N R N, let u C 2( [0, ); R N ) and let u R N. Set U(t) = ( u(t), u (t) ), U 0 = (u 0, u 0) and U = (u, 0). Let F and let E λ be defined as in Subsection 3.1 and Proposition 3.9, respectively. Note that if u crit G then U crit E λ and ϕ(t) = ct desingularizes E λ at U, for any λ 0 ( Remark 2.3 (a) and (3.23) ). Otherwise, u crit G and we shall apply Proposition 3.11. Since sup t>0 u(t) <, u (t) + γu (t) = A(t) where A is bounded. Thus, u (t) = u (0)e γt + t exp( γ(t s))a(s)ds, 0 and by a straightforward calculation, sup t>0 u (t) <. It follows that sup t>0 U(t) <. Let R = sup t>0 U(t). Let λ 0 > 0 and 0 < λ 1 < λ 0 be given by Propositions 3.9 and 3.11, respectively. Let us fix 0 < λ < λ 1 and let α > 0 be given by Proposition 3.9 for such E λ and R. By Proposition 3.9, the first order system U (t) + F ( U(t) ) = 0, t R +, (4.2) has a quasi-gradient structure for E λ on B(0, R) (Definition 3.1). Finally, since G has the KL property at u, E λ also has the KL property at U (Proposition 3.11). It follows that Theorem 3.2 applies to U, from which (i) follows. The estimate part of the proof of (ii) will follow from Theorem 3.7, if we establish that for any R > 0, there exists b > 0 such that for any (u, v) B(0, R) B(0, R), E λ (u, v) b F(u, v). First we observe that for each R > 0 and for any (u, v) B(0, R) B(0, R), there exists k 1 0 such that E λ (u, v) 2 k 1 ( G(u) 2 + v 2). (4.3) This follows trivially by Cauchy-Schwarz inequality and the fact that 2 G is continuous hence bounded on bounded sets. Fix σ > 0 and recall the inequality 2ab σ 2 a 2 + b2 σ for all real numbers a, b. By Cauchy-Schwarz inequality 2 and the previous inequality F(u, v) 2 = v 2 + γv + G(u) 2 (1 + γ 2 ) v 2 + G(u) 2 2 γv G(u) (1 + γ 2 ) v 2 + G(u) 2 σ 2 γv 2 1 σ 2 G(u) 2 = (1 (σ 2 1)γ 2 ) v 2 + (1 1σ ) 2 G(u) 2. ( Choosing σ > 1 so that 1 (σ 2 1)γ 2 > 0 yields k 2 > 0 such that F(u, v) 2 k 2 G(u) 2 + v 2), for any u, v in R N. Combining this last inequality with (4.3), we obtain E λ (u, v) 2 k1 k 2 F(u, v) 2, for any (u, v) B(0, R) B(0, R). Hence the result. Remark 4.2. (a) As announced previously convergence rates depend directly on the geometry of G through ϕ. (b) The fact that the length of the velocity curve u is finite suggests that highly oscillatory phenomena are unlikely. 17

5 Consequences In the following corollaries, the mapping R + t u(t) is a solution curve of (1.1). Corollary 5.1 (Convergence theorem for real-analytic functions [31]). Assume that G : R N R is realanalytic and let u be a bounded solution to (1.1). Then we have the following result. (i) (u, u ) has a finite length. In particular u converges to a critical point u. (ii) When u converges to u, we denote by ϕ(s) = cs θ ( with c > 0 and θ ( 0, 1 2]) the desingularizing function of G at u the quantity θ is the Lojasiewicz exponent associated with u. One has the following estimates. θ (a) u(t) u ct ( 1 2θ, with c > 0, when θ 0, 1 2). (b) u(t) u c exp( c t), with c, c > 0, when θ = 1 2. Proof. The proof follows directly from the original Lojasiewicz inequality [44, 43] and the fact that desingularizing functions for real-analytic functions are indeed of the form ϕ(s) = cs θ with θ (0, 1 2 ]. Hence (2.10) holds and Theorem 4.1 applies, see also Remark 3.4 (c). Corollary 5.2 (Convergence theorem for definable functions). Let O be an o-minimal structure that contains the collection of semi-algebraic sets. Assume G : R N R is C 2 and definable in O. Let u be a bounded solution to (1.1). Then we have the following result. (i) u and u belong to L 1( (0, ); R N ) and in particular u converges to a single limit u in crit G. (ii) When u converges to u we denote by ϕ the desingularizing function of G at u. One has the following estimate where ν is a solution of the worst-case gradient system u(t) u cν(t), ν (t) + (ϕ 1 ) (ν(t)) = 0, ν(0) > 0. Proof. G is a KL function by Kurdyka s version of the Lojasiewicz inequality. The fact that ϕ (s) c s comes from Lemma 2.9. So, Theorem 4.1 applies. Corollary 5.3 (Convergence theorem for the one-dimensional case [30]). Let G C 2 (R; R) and let u be a bounded solution to (1.1). Then u converges to a single point and we have the same type of rate of convergence as in the previous corollary. Proof. We proceed as in [49]. Argue by contradiction and assume that ω(u 0, u 0), the ω-limit set of (u 0, u 0), is not a singleton. Since ω(u 0, u 0) is connected in R, it is an interval and has a nonempty interior. Take u in the interior of ω(u 0, u 0) The Lojasiewicz inequality trivially holds at u for G 0 with ϕ(s) = s (recall u is interior). Apply then Theorem 4.1. Remark 5.4. In the one-dimensional case, convergence can be obtained with much more general forms of damping, see [15]. Corollary 5.5 (Convergence theorem for convex functions satisfying growth conditions). Let G C 2 (R N ; R) be a convex function such that argmin G def = { u R N ; G(u) = min G }, is nonempty (note that argmin G = crit G). Assume further that, for each minimizer x, there exists η > 0, such that G satisfies u B(x, η), G(u) min G + c dist(u, argmin G) r, (5.1) with r 1 and c > 0. Then the solution curve t (u(t), u (t)) has a finite length. In particular u converges to a minimizer u of G as t goes to. 18