Properly degenerate KAM theory (following V.I. Arnold)

Properly degenerate KAM theory (following V.I. Arnold) Luigi Chierchia Dipartimento di Matematica Università Roma Tre Largo S. L. Murialdo 1, I-00146 Roma (Italy) luigi@mat.uniroma3.it Gabriella Pinzari Dipartimento di Matematica ed Applicazioni R. Caccioppoli Università di Napoli Federico II Monte Sant Angelo Via Cinthia I-8016 Napoli (Italy) pinzari@mat.uniroma3.it October 7, 009 This paper is dedicated to the memory of Professor Nikolaĭ Nekhoroshev Abstract Arnold s Fundamental Theorem on properly degenerate systems [3, Chapter IV] is revisited and improved with particular attention to the relation between the perturbative parameters and to the measure of the Kolmogorov set. Relations with the planetary many body problem are shortly discussed. Keywords: KAM theory, Kolmogorov set, many body problem, small divisors, invariant tori, degeneracies. Contents 1 Introduction and Result Tools: Averaging, Birkhoff normal form and two scale KAM 7 3 Proof of Theorem 1.4 11 4 Proof of Theorem 1.3 18 A Averaging theory (Proposition.1) 19 B Two scale KAM theory (proof of Proposition.3) 0 C Measure estimates (proof of Lemma 3.1) 8 References 30 Acknowledgments. The authors are grateful to Jacques Féoz for many helpful discussions. This paper is based on the PhD thesis [1]. Partially supported by the Italian MIUR proect Metodi variazionali e equazioni differenziali nonlineari and by European Research Council under F P 7 proect New connections between dynamical systems and Hamiltonian PDEs with small divisors phenomena. 1

1 Introduction and Result 1.1 A problem that one often encounters in applications of KAM theory is related to the presence of degeneracies. An important example (which actually motivated the birth of KAM theory) is the problem of finding a positive measure set in phase space corresponding to quasi periodic motions in the planetary (1 + n) body problem (i.e., 1 + n point masses interacting only under a gravitational potential modeling a system formed by a star and n planets). In this case the integrable limit (i.e., the n uncoupled two body systems formed by the star and one planet) does not depend upon a full set of action variables ( proper degeneracy ) and therefore typical non degeneracy conditions (such as Kolmogorov s non degeneracy or Arnold s iso energetical non degeneracy) are strongly violated. To deal with properly degenerate systems V.I. Arnold developed in [3] a new KAM technique, which is summarized in what he called the Fundamental Theorem [3, Chapter IV]. Arnold then applied the Fundamental Theorem to the planar, planetary, nearly circular three body problem (n = ) proving for the first time relatively bounded motions for a positive set of initial data. A full proof of this result in the general spatial many body problem turned out to be more difficult than expected. After an extension to the spatial three body case [14], a first complete proof was published only in 004 [9], where a different (smooth) KAM technique (due to M.R. Herman) was used; for a real analytic proof, see [6]. In this paper we revisit and extend Arnold s Fundamental theorem so as to weaken its hypotheses and to improve the measure estimates on the Kolmogorov set (i.e., the union of maximal invariant quasi-periodic tori). 1. In properly degenerate KAM theory it is not enough to make non degeneracy assumptions on the unperturbed limit (as in standard KAM theory). To describe a typical setting, let us consider a Hamiltonian function of the form H(I, ϕ, p, q; µ) := H 0 (I) + µp (I, ϕ, p, q; µ), (1.1) where 1 (I, ϕ) V T n1 R n1 T n1 and (p, q) B R n are standard symplectic variables; here V is an open, connected set in R n1 and B is a (n ) ball around the origin; n, where n := n 1 + n is the dimension of the phase space P := V T n1 B, (1.) which is endowed with the standard symplectic two form n 1 n di dϕ + dp dq = di dϕ + dp dq. =1 The Hamiltonian H is assumed to be real analytic. When the perturbative parameter µ is set to be zero (in the planetary case µ measures the ratio between the masses of the planets and that of the star) the system is integrable but depends only on n 1 < n action variables. A typical further assumption is that the averaged (or secular) perturbation, P av (p, q; I, µ) := T n 1 =1 P (I, ϕ, p, q; µ) dϕ, (1.3) n1 (π) has an elliptic equilibrium in the origin with respect to the variables (p, q). Under suitable assumptions on the first and/or second order Birkhoff invariants (see [10] for general information) one can guarantee the existence of maximal KAM tori near the secular tori 1 T n denotes the standard n dimensional flat torus R n /(πz n ). {I} T n1 T n η, (1.4)

where η = (η 1,..., η n ), Tη n denotes a n dimensional torus given by the product of n circles of radii η > 0 and ɛ := max η is small. More precisely, Arnold makes the following assumptions 3 : (A1) I V I H 0 is a diffeomorphism; n (A) P av (p, q; I) = P 0 (I) + Ω i (I)r i + 1 i=1 and o 6 / (p, q) 6 0 as (p, q) 0; n i,=1 β i (I)r i r + n i,,k=1 λ ik (I)r i r r k + o 6 where r i := p i + q i (A3) the matrix of the second order Birkhoff invariants is not singular, i.e., det β(i) const > 0 for all I V. We can now state Arnold s Fundamental Theorem. Denote by B ɛ = B n ɛ = {y R n : y < ɛ} the n ball of radius ɛ and let P ɛ := V T n1 B ɛ (1.5) and recall the definitions of H and the phase space P in, respectively, (1.1) and (1.). Theorem 1.1 (Arnold s Fundamental Theorem [3, p. 143]) Let H be real analytic on P and assume (A1) (A3). Then, there exists ɛ > 0 such that, for 0 < ɛ < ɛ, 0 < µ < ɛ 8, (1.6) one can find a set K P ɛ P formed by the union of H invariant n dimensional tori close to the secular tori in (1.4), on which the H motion is analytically conugated to linear Diophantine 4 quasi periodic motions. The set K is of positive Liouville Lebesgue measure and satisfies meas K > (1 const ɛ a ) meas P ɛ, where a := 1/(8(n + 4)). (1.7) Remark 1.1 By Birkhoff s theory (compare Proposition. below), the expansion in (A) for P av may be achieved if one assumes that (p, q) P av (p, q; I) has an elliptic equilibrium in p = q = 0 and the first order Birkhoff invariants Ω i are non resonant up to order 6, ie 5, n Ω (I)k const > 0 I V, 0 < k 6, k Z n. (1.8) =1 In this paper we relax condition (1.6) and replace assumption (A) with either (A ) (p, q) P av (p, q; I) has an elliptic equilibrium in the origin p = q = 0 and the first order Birkhoff invariants are non resonant uo to order four, i.e., they verify (1.8) with 6 replaced by 4. or n (A ) P av (p, q; I) = P 0 (I) + Ω i (I)r i + 1 as (p, q) 0. i=1 n i,=1 β i (I)r i r + o 4 with r i := p i + q i and o 4 / (p, q) 4 0 We shall prove the following two theorems. An interesting point is what is the relation between ɛ and µ, especially in view of physical applications (in the planetary case ɛ measures the eccentricities and relative inclinations of the star planet motions): this matter will be further discussed in the following. 3 From now on we drop the dependence on µ of the perturbation, assuming that such dependence is smooth enough, say C 1, and that the norms are uniform in µ. 4 I.e., the flow is conugated to the Kronecker flow θ T n θ + ω t T n, with ω = (ω 1, ω ) satisfying (1.19) below. 5 Here and below, for integer vectors k Z m, k := k 1 = P m =1 k. See also notation in Chapter. 3

Theorem 1. Let H be real analytic on P and assume (A1), (A ) and (A3). Then, there exist positive numbers ɛ, µ, C and b such that, for 0 < ɛ < ɛ, 0 < µ < µ, µ < 1 C (log ɛ 1, (1.9) ) b one can find a set K P formed by the union of H invariant n dimensional tori, on which the H motion is analytically conugated to linear Diophantine quasi periodic motions. The set K is of positive Liouville Lebesgue measure and satisfies ( ( µ meas P ɛ > meas K > 1 C (log ɛ 1 ) b + )) ɛ meas P ɛ. (1.10) Next theorem needs stronger hypotheses on µ but there are no conditions on the first order Birkhoff invariants. Theorem 1.3 Let H be real analytic on P and assume (A1), (A ) and (A3). Then, there exist positive numbers ɛ, µ, C and b such that, for 0 < ɛ < ɛ, 0 < µ < µ, µ < ɛ 6 (log ɛ 1, (1.11) ) b one can find a set K P ɛ formed by the union of H invariant n dimensional tori, on which the H motion is analytically conugated to linear Diophantine quasi periodic motions. The set K is of positive Liouville Lebesgue measure and satisfies ) meas P ɛ > meas K > (1 C ɛ meas P ɛ. (1.1) 1.3 Let us make a few remarks. (i) Under assumption (A ), near p = 0 = q, the dynamics is approximated by the dynamics governed by the integrable secular (averaged and truncated) Hamiltonian ( H sec := H 0 (I) + µ P 0 (I) + n i=1 Ω i (I) p i + q i ). (1.13) The phase space P is foliated by n dimensional H sec invariant tori as in (1.4) with 0 < ɛ < ɛ, where ɛ denotes the radius of the ball B in (1.). Indeed, in this case Tη n are simply given by {p i + q i = η, n } with η ɛ. In the perturbed case the fate of the secular tori may be different according to the relation between ɛ and µ. In fact what happens is that, if µ < ɛ α, with α > 1 (in particular, if (1.6) or (1.11) holds), then K P ɛ as in Arnold s Theorem, but if µ > ɛ α then, is general K is not contained in P ɛ and the persistent tori may be not so close to the secular tori {I} T n1 {p i + q i = η, n } but rather they are close to the translated tori {I} T n1 {(p i p 0 i ) + (q i q 0 i ) = η, n } where (p 0 i, q0 i ) = ( p 0 i (I; µ), q0 i (I; µ) ) are the coordinates of a new equilibrium, which depend upon the full averaged system and which may be logarithmically distant from the origin (as far as 1/ log ɛ 1 ). In any case, the set K fills almost completely a region diffeomorphic to and of equal measure of P ɛ. A precise geometrical description of the Kolmogorov set K is given in Step 6 of 3. (ii) As mentioned above, in the planetary problem, µ measures the mass ratio between the planets and the star, while ɛ is related to the eccentricities and inclinations of the (instantaneous) two body systems planet star. Condition (1.9) is much weaker than Arnold s condition (1.6) and allows, at least in principle, applications to a wider class of planetary systems. 4

Clearly, in order to apply properly degenerate KAM theory to a concrete system such as the outer Solar system 6 one should also estimate ɛ and µ in (1.9), which would be quite a technical achievement 7. (iii) Arnold declared [3, end of p. 14] that he made no attempt to achieve elegance or precision in evaluating constants adding that the reader can easily strengthen the results. However, the authors are not aware of improvements on Arnold s results (in the full torsion case, compare next item) and especially on the issue of giving possibly sharp estimates on the measure of the Kolmogorov set arising in properly degenerate systems. At this respect it would be interesting to know whether estimate (1.10) could be improved or not. (iv) Relaxing (1.8), i.e., bringing to four the order of non resonance to be checked, has an interesting application in the case of the (1 + n) body problem. In fact, Herman and Feóz showed [9] that, in the spatial case (n 1 = n and n = n), the only linear relations satisfied by the first order Birkhoff invariants Ω are (up to rearranging indices): Ω n = 0, n 1 =1 Ω (I) = 0. (1.14) The first relation is due to rotation invariance of the system, while the second relation is usually called Herman resonance 8. Now, since in the spatial case, Herman resonance is of order n 1, one sees that for n 3 it is not relevant for (A ) (but it is for (1.8)). Actually, at this respect, Theorem 1.3 might be even more useful since it involves no assumption on the Ω so that in possible application to the spatial (1 + n) body problem, Herman resonance plays no rôle. (v) The properly degenerate KAM theory developed in [9] (for the C case) and in [6] (for the analytic case), being based on weaker non degeneracy assumptions, is different from Arnold s theory. Roughly speaking, while Arnold s approach is ultimately based on Kolmogorov s non degeneracy condition ( full torsion in a two scale setting ), the approach followed in [9, 6] (which might be called weak properly degenerate KAM theory ) is based on the torsion of the frequency map, exploiting conditions studied by Arnold himself, Margulis, Pyartli, Parasyuk, Bakhtin and especially Rüßmann [15]; for a review, see [16]. Indeed, for Arnold s properly degenerate theory one has to check that the matrix of the second order Birkhoff invariants is not singular (condition (A3) above), while for the weak properly degeneracy theory it is enough to check a generic property involving only the first order Birkhoff invariants: Conditions (A) and (A3) are replaced by the requirement that the re scaled frequency map I V ˆω(I) := ( H 0 (I), Ω(I)) is non planar, i.e., ˆω(V ) does not lie in any (n 1) dimensional linear subspace of R n. Incidentally, the presence of the resonances (1.14) makes difficult a direct application of weak properly degenerate KAM theory to the spatial (1 + n) body problem in standard Poincaré variables 9. Explicit measure estimates on the set of persistent tori in the context of weak properly degenerate KAM theory are not readily available 10. 6 In the outer Solar System (Sun, Jupiter, Saturn, Uranus and Neptune) µ is of order 10 3 and the largest eccentricity is of 0.05 (Saturn). 7 For partial results in this direction, see [5] and [11]. 8 Compare also [1]. 9 Application of the properly degenerate KAM theory developed in the present paper using Deprit variables [7] will be matter of a future paper by the authors. 10 In fact, although Pyartli s theorem on the measure of Diophantine points on a non planar curve is quantitative (compare [9, Théorème 55]), explicit measure estimates of the Kolmogorov set in the N body problem, following the strategy in [9], do not appear completely obvious. 5

(vi) Let us briefly (and informally) recall Arnold s scheme of proof. First, by classical averaging theory (see, e.g., []) the Hamiltonian (1.1) is conugated to a Hamiltonian H satisfying, for any small 11 σ > 0, H = H 0 + µp av + O(µ σ ) (1.15) where P av ia as in (A). Denoting P av [6] (1.15) can be rewritten as the truncation in (p, q) at order 6 of P av, one sees that if (p, q) < ɛ. In turn, (1.16) is of the form H = H 0 + µp [6] av + O(µɛ 7 ) + O(µ σ ) (1.16) H = H 0 + µp [6] av + O(µɛ 7 ) if (1.6) holds. At this point, a two time scale KAM theorem can be applied. The scheme of proof of Theorem 1.3 is similar, but we use more accurate estimates based on the averaging theory described in.1 below and, especially, on the two scale KAM theorem described in.3 below; this last result, in particular, is not available in literature and we include its proof in Appendix B. To relax significantly the relation between µ and ɛ, the above strategy has to be modified. The scheme to prove Theorem 1. is the following: step 1: averaging over the fast angles ϕ s; step : determination of the elliptic equilibrium for the secular system ; step 3: symplectic diagonalization of the secular system; step 4: Birkhoff normal form of the secular part; step 5: global action angle variables for the full system; step 6: construction of the Kolmogorov set via an application of a two scale KAM theorem and estimate of its measure. 1.4 Properly degenerate systems present naturally two different scales: a scale of order one related to the unperturbed system (the typical velocity of the fast angles ϕ s) and a scale of order µ (typical size of the secular frequencies) related to the strength of the perturbation. Furthermore, a third scale appears naturally, namely, the distance from the elliptic equilibrium in the (p, q) variables. We now give a more technical and detailed statement, from which Theorem 1. follows at once. Theorem 1.4 Under the same notations of Theorem 1. and assumptions (A1), (A ) and (A3), let τ > n 1 and 1 τ > n := n 1 + n, with n 1, n positive integers. Then, there exist µ, ɛ < 1, γ, C > 1 such that, if (1.9) holds and if, γ 1, are taken so as to satisfy µ γ 1 and γ max{ µ(log ɛ 1 ) τ+1, γ ɛ 5/ < γ 1 < γ γ ɛ 5/ (log (ɛ 5 /γ 1 ) 1 ) τ +1 < < γ ɛ, 3 µɛ(log ɛ 1 ) τ+1, ɛ (log ɛ 1 ) τ+1 } < < γ (1.17) 11 The appearence of the exponents ( σ) (rather than the more natural exponent ) is due to the presence of small divisors. 1 At contrast with classical KAM theory, where the Diophantine constant can be taken greater than n 1, here one needs τ > n (in [3] it is taken n + 1): this is due to the asymmetry of the frequency domain having n 1 dimensions of order one and n n 1 = n dimensions small with the perturbative parameters. 6

then, one can find a set K P formed by the union of H invariant n dimensional tori close to the secular tori in (1.4), on which the H motion is analytically conugated to linear Diophantine quasi periodic motions. The set K is of positive measure and satisfies ( meas P ɛ > meas K > [1 C + γ 1 + ɛ + ɛn/)] meas P ɛ. (1.18) Furthermore, the flow on each H invariant torus in K is analytically conugated to a translation ψ T( n ψ + ωt ) T n with Diophantine vector ω = (ω 1, ω ) R n1 R n satisfying, for each k = (k 1, k ) Z n 1 Z n \{0}, γ 1 if k 1 0 ; k τ ω 1 k 1 + ω k µ (1.19) if k 1 = 0, k 0. k τ To obtain Theorem 1. from Theorem 1.4 one can choose = γ max { µ(log ɛ 1 ) τ+1, ɛ /3 (log ɛ 1 ) τ+1}, γ 1 = = γ ɛ 5/ ; (1.0) then (1.10) follows easily 13, with b = τ + 1. The proof of Theorem 1.3, as already mentioned, is simpler and it will be shortly given in 4. Tools: Averaging, Birkhoff normal form and two scale KAM First of all we fix some notation, which will be used throughout the paper. in R n1 we fix the 1 norm: I := I 1 := 1in 1 I i ; in T n1 we fix the sup metric : ϕ := ϕ := max 1in 1 ϕ i (mod π); in R n we fix the sup norm: p := p := max p i, q := q := max q i ; 1in 1in for matrices we use the sup norm : β := β := max β i ; i, if A R ni, or A T n1, and r > 0, we denote by A r := x A complex r neighbrhood of A (according to the prefixed norms/metrics above); if f is real analytic on a complex domain of the form U v T m, or, simply, f s v,s its sup Fourier norm : f Uv T m s f v,s := k Z m sup f k (u) e k s, k := u U v where f k (u) denotes the k th Fourier coefficient of f = { } z C ni : z x < r the 1im k Z m f k (u)e ik ϕ ; with U R d, we denote by 13 First, let us check that (1.17) holds. From (1.9) it follows that < γ (provided C > γ and ɛ is small enough). The lower bound on is checked by considering the cases ɛ µ and µ < ɛ separately. The bounds on γ 1 are obvious.the bounds on are true for γ big enough. Thus, (1.17) is checked. Finally, (1.0) and (1.18) imply easily (1.10). k i 7

if f is as in the previous item, K > 0 and Λ is a sublattice of Z m, T K f and Π Λ f denote, respectively, the K truncation and the Λ proection of f: T K f := f k (u)e ik ϕ ; k K f k (u)e ik ϕ, Π Λ f := k Λ if f : A R d R n is a Lipschitz function and ρ > 0 a weight, we denote its ρ Lipschitz norm by f Lip ρ,a := f(i) f(i ) ρ 1 sup f + L(f), L(f) := sup A I I A I I. (.1) D γ1,γ,τ R n1+n denotes the set of Diophantine (γ 1, γ, τ) numbers, i.e., the set of vectors ω R n1+n satisfying for any k = (k 1, k ) Z n1+n \{0}, inequality (1.19) with τ = τ, = γ and µ = 1. When γ 1 = γ = γ, we obtain the usual Diophantine set D γ,τ..1 Averaging theory The first step of the proof of Theorem 1.4 (and hence of Theorem 1.) is based upon averaging theory. We shall follow the presentation given in [4, Appendix A], which in turn is based upon [13]. Proposition.1 (Averaging theory) Let K, s and s be positive numbers such that Ks 6 and let α 1 α > 0; let A B B (R l1 R l ) R m R m, and v = (r, r p, r q ) a triple of positive numbers. Let H := h(i) + f(i, ϕ, p, q) be a real-analytic Hamiltonian on W v, s+s := A r B rp B r q T l1+l s+s. Finally, let Λ be a (possibly trivial) sub lattice of Z l1+l and let ω = (ω 1, ω ) denote the gradient ( I1 h, I h) R l1+l. Let k = (k 1, k ) Z l1 Z l and assume that { α1, if k ω k 1 0 I A α, if k 1 = 0 r, k = (k 1, k ) / Λ, k K (.) E := f v, s+s < α d e(1 + em), where d = min{rs, r 7 p r q }, c m := c m Ks Then, there exists a real-analytic, symplectic transformation such that with g in normal form and f small: g = k Λ. (.3) Ψ : (I, ϕ, p, q ) W v/, s+s/6 (I, ϕ, p, q) W v, s+s (.4) H := H Ψ = h + g + f, g k (I, p, q ) e ik ϕ, g Π Λ T Kf v/, s+s/6 1 11 7 c m E E α d 4, f v/, s+s/6 e Ks/6 9 c m E e Ks/6 E. (.5) α d Moreover, denoting by z = z(i, ϕ, p, q ), the proection of Ψ(I, ϕ, p, q ) onto the z variables (z = I 1, I, ϕ, p or q) one has max{α 1 s I 1 I 1, α s I I, α r ϕ ϕ, α r q p p, α r p q q } 9E. (.6) 8

This Proposition is essentially Proposition A.1 of [4] with two slight improvements. The first improvement is trivial and concerns the introduction of the parameter s so as to separate the rôle of the analyticity loss in the angle variables from the initial angle domain. Such variation is important, for example, in applying the Averaging Theorem infinitely many times. The second improvement is a bit more delicate and we use it in the proof of Proposition.3 below. It concerns the separation of two scales in the frequencies ω = I h. Proposition.1 holds also for l 1 0, l = 0 (i.e., there is only one action scale), in which case α := α 1 = α, and in the case m = 0 (i.e., there are no (p, q) variables), in which case one can take d = rs, c m = c 0 = e/. In the following, Proposition.1 will be applied twice: in step 1 of 3 (with l 1 = n 1, l = 0, m = n ) and in Appendix B with m = 0. Proposition.1 is proved in Appendix A.. Birkhoff normal form We now recall a fundamental result due to Birkhoff on normal forms. We follow [10]. Proposition. (Birkhoff normal form) Let α > 0, s 3; let Ω = (Ω 1,, Ω m ) R m be non resonant of order s, i.e., and let z = (p, q) B m ɛ 0 Ω k α > 0, k Z m with 0 < k s (.7) = {z : z < ɛ 0 } R m H(z) be a real analytic function of the form H(z) = m i=1 Ω i r i + O( z 3 ) where r i := p i + q i Then, there exists 0 < ɛ ɛ 0 and a real analytic and symplectic 14 transformation φ : z = ( p, q) B m ɛ z + ẑ( z) B m ɛ 0 which puts H into Birkhoff normal form up to order s, i.e. 15,. (.8) H := H φ = m i=1 [s/] Ω i r i + Q ( r) + O( z s+1 ) (.9) = where, for [s/], the Q s are homogeneous polynomials of degree in r = ( r 1,, r m ) with r := p + q. The polynomials Q do not depend on φ. Following the proof of this classical result as presented in [10] one can easily achieve the following useful amplifications. 1. The construction of the transformation φ is iterative and can be described as follows. There exist positive numbers ɛ := ɛ s < ɛ s 3 < < ɛ 0, and a symplectic transformations φ i such that φ = ˆφ s := φ 1 φ s, that H ˆφ i is in Birkhoff normal form up to order i + and φ i : sup B m ɛ i z = ( p, q) B m ɛ i ẑ i 1 c i 1 m i 1 α (ɛ i 1) i+1, 14 With respect to the standard form dp dq = X 15 [x] denotes the integer part of x. z + ẑ i 1 ( z) B m ɛ i 1, 1 i s, with 1im dp i dq i. 9

where c i 1 depend only on the dimension m and m i 1 are defined as follows. For i 1 = 0, let P 0 the homogeneous polynomial of degree 3 for which H(z) m =1 Ω r = P 0 + O( z 4 ), while, for i 1 1, let P i 1 the homogeneous polynomial of degree i + for which H(z) ˆφ i 1 m =1 [(i+1)/] Ω r Q (r) = P i 1 + O( z i+3 ). Write P i 1 = α + β =i+ c α,β m =1 (p + iq ) α (p iq ) β, where i:= 1. Then, m i 1 := = max c α,β. (.30) α,β:α β. Proposition. can be easily extended to the case of a real analytic function H(z; I) = m i=1 Ω i (I) r i + O( z 3 ), which also depends on suitable action variables I. More precisely, if A is an open subset of R n, ρ 0, s 0, ɛ 0 are positive numbers, (I, ϕ) and z = (p, q), with (I, ϕ, z) A ρ0 T n σ 0 Bɛ m 0, are conugate couples of symplectic variables with respect to the standard form di dϕ + dp dq and Ω = (Ω 1,, Ω m ) is a suitable real analytic function defined on A ρ0 verifying (.7) on A ρ0, then, one can prove that for suitable 0 < ɛ = ɛ s < < ɛ 0, 0 < σ = σ s < < σ 0, 0 < ρ = ρ s < < ρ 0, c i, there exist s real analytic, symplectic transformations which we still denote φ i, (Ĩ, ϕ, z) A ρ i T n σ i B m ɛ i φ i ( Ĩ, ϕ + ˆϕ i 1 ( z; Ĩ), z + ẑ i 1( z; Ĩ)) A ρ i 1 T n σ i 1 B m ɛ i 1 such that (.9) holds with φ = ˆφ s = φ 1 φ s, Ω i = Ω i (I) and suitable homogeneous polynomials Q ( r; I) of degree in r = ( r 1,, r m ) whose coefficients are analytic functions on A ρ. At each step, the functions ( z; Ĩ) ẑ i 1( z; Ĩ), ( z; Ĩ) ˆϕ i 1( z; Ĩ) verify B m ɛ i m i 1 sup ẑ i 1 c i 1 A ρi α (ɛ i 1) i+1 m i 1, sup ˆϕ i 1 c i 1 (ɛ i 1 ) i+ (.31) B m A ρi αρ 0 ɛ i where, if, for any fixed I A ρi 1, m i 1 (I) are defined as in (.30) with c α,β m i 1 = sup m i 1 (I). A ρi 1 = c α,β (I), then,.3 Two scale KAM theory The invariant tori of Theorem 1.3 and 1.4 will be obtained as an application of a KAM Theorem, adapted to two different frequency scales, which is described in the following Proposition.3 (Two scale KAM Theorem) Let n 1, n N, n := n 1 + n, τ > n, γ 1 γ > 0, 0 < 4s s < 1, ρ > 0, D R n1 R n, A := D ρ, and let H(J, ψ) = h(j) + f(j, ψ) be real analytic on A T n s+s. Assume that ω 0 := h is a diffeomorphism of A with non singular Hessian matrix U := h and let Û denote the n n 1 submatrix of U, i.e., the matrix with entries Ûi = U i, for n 1 + 1 i n, 1 n. Let M sup U, A ˆM sup Û, A M sup U 1, E f ρ, s+s ; A 10

define { ĉ := max 8 4 τ +1 } n, 6 ( EM ) 1 L where log + a := max{1, log a} K := 6 s log + γ1 { γ 1 ˆρ := min 3MK τ +1, γ 3 { L := max M, M 1, ˆM 1} Ê := EL ˆρ ; }, ρ ˆMK τ +1 finally, let M1, M upper bounds on the norms of the submatrices n 1 n, n n of U 1 of the first n 1, last n rows 16. Assume the perturbation f so small that the following KAM condition holds ĉê < 1. (.3) Then, for any ω Ω := ω 0 (D) D γ1,γ,τ, one can find a unique real analytic embedding φ ω : ϑ T n (v(ϑ; ω), ϑ + u(ϑ; ω)) Re (D r ) T n (.33) where r := 0nÊ ˆρ such that T ω := φ ω (T n ) is a real analytic n dimensional H invariant torus, on which the H flow is analytically conugated to ϑ ϑ + ω t. Furthermore, the map (ϑ; ω) φ ω (ϑ) is Lipschitz and one to one and the invariant set K := T ω satisfies the following measure estimate ω Ω ( ) ( meas Re (D r ) T n \ K c n meas (D \ D γ1,γ,τ T n ) + meas ( Re (D r ) \ D) T n), (.34) where D γ1,γ,τ denotes the ω 0 preimage of D γ1,γ,τ in D and c n can be taken to be c n = (1 + (1 + 8 nê)n ). Finally, the following uniform estimates hold, on T n Ω ( v 1 ( ; ω) I1 0 M1 (ω) 10n M + ˆM ( )Ê ˆρ, v ( ; ω) I 0 M (ω) 10n M M + ˆM )Ê ˆρ, M u( ; ω) Ê s (.35) where v i denotes the proection of v R n1 R n over R ni and I 0 (ω) = (I 0 1 (ω), I 0 (ω)) D is the ω 0 pre image of ω Ω. This result is proved in Appendix B. 3 Proof of Theorem 1.4 In this section we prove Theorem 1.4 (and hence Theorem 1.; compare the remark following the formulation of Theorem 1.4 in 1, 1.4). In what follows, C denotes suitably positive constants greater than one independent of ɛ and µ,, γ 1, but which may depend on n 1, n, H 0, s 0, etc. Without loss of generality, we may assume that H has an analytic extension to a domain P ρ0,ɛ 0,s 0 := V ρ0 T n1 s 0 B ɛ0 with s 0 < 1 and with ω 0 := H 0 a 16 I.e., Mi sup T i, i = 1,, if U 1 T1 =. D ρ T 11

diffeomorphism of V ρ0. We can also assume that the perturbation P has sup fourier norm P ρ0,ɛ 0,s 0 1 up to change the definition of µ. Preliminary step. In view of (A ) on p. 3, we can assume that the quadratic part of P av (p, q; I) is in standard form P 0 (I) + Ω i (I) r i + o, where Ω i (I) are the first order Birkhoff invariants; compare 1in [17]. Furthermore (again by (A )), since the Ω i (I) are non resonant up to the order four, by Birkhoff theory (compare. above), one can find a symplectic transformation, O( (p, q) ) close to the identity, which transforms the original Hamiltnian into 17 (1.1), with P av as in the standard form in (A ). Step 1 Averaging over the fast angles ϕ s Let 0 < ɛ < e 1/5. The first step consists in removing, in H, the dependence on ϕ up to high orders (namely, up to O(µ ɛ 5 )). To do this, we use Averaging theory (Proposition.1 above), with l 1 = n 1, l = 0, m = n h = H 0, g 0, f = µp, B = B = {0}, r p = r q = ɛ 0, s = s 0, s = 0, Λ = {0}, K such that e Ks 0/6 := ɛ 5 30 i.e., K = log ɛ 1, (3.36) s 0 A = D, r = ρ, where D, ρ are defined as follows. Let τ > n 1, max{1, 5 (30/s 0 ) τ+1 c n M}, γ µ(log ɛ 1 ) τ+1. Then, take ) { D := ω 0 1 } (D,τ V and ρ := min M K, ρ τ+1 0 M := maxi, sup Vρ0 i H 0(I), γ, (3.37) where D,τ R n1 is defined ust before.1. From the Diophantine inequality it follows that ω 0, so that γ µ(log ɛ 1 ) τ+1 ω 0 ρ0. By the choice of D, the following standard measure estimate holds ( meas V \ D ) C meas (V ) (3.38) where C depends on the C 1 norm of H 0. By the previous choices, when I D ρ, the unperturbed frequency map ω 0 = H 0 verifies (.), with α 1 = α = ᾱ := K, in fact: τ inf ω 0 (J) k J D ρ, 0< k K inf ω 0 (I) k I D, 0< k K sup J D ρ,i D, 0< k K ( ω 0 (J) ω 0 (I)) k Kτ ρ K M. (3.39) K τ The smallness condition (.3) is easily checked, provided E = µ is chosen small enough, because the choice of and γ implies { } µ 7 c n Ks0 1 max ᾱd, C µ < 1. Condition Ks 0 6 is trivially satisfied. Thus, by Proposition.1, we find a real analytic symplectomorphism φ : (Ī, ϕ, p, q) W v, s (I, ϕ, p, q) W v0,s 0 v := v 0 / := ( ρ/, ɛ 0 /), s := s 0 /6 where W v0,s 0 := D ρ0 T n1 s 0 B ɛ0 (v 0 = (ρ 0, ɛ 0 )), and, by the choice of K in (3.36), H is transformed into 18 H( Ī, ϕ, p, q) = H φ(ī, ϕ, p, q) = H 0 (Ī) + µ N(Ī, p, q) + µe Ks/6 P ( Ī, ϕ, p, q) = H 0 (Ī) + µ N(Ī, p, q) + µɛ5 P ( Ī, ϕ, p, q). (3.40) 17 By abuse of notations, we use the same name for the variables, but, strictly speaking, they differ from the original variables by a quantity of O( (p, q) ) in (p, q), O( (p, q) 3 ) in ϕ (the actions I are the same). 18 For simplicity of notation, we do not write explicitely the dependence on µ, ɛ,, that is, we write H(Ī, ϕ, p, q), etc., in place of H(Ī, ϕ, p, q; µ, ɛ, ), etc. 1

By (.5), P v, s C and sup N P av C µ(log ɛ 1 ) τ+1 D ρ/. (3.41) In view of (.6), the transformation φ verifies I Ī, p p, q q C µ(log ɛ 1 ) τ, ϕ ϕ C µ(log ɛ 1 ) τ+1. (3.4) Remark 3.1 The right hand sides of (3.41) and (3.4) can be made small as we please, provided µ and ɛ are small and is chosen suitably. The precise choice will be discussed below. Step Determination of the elliptic equilibrium for the secular system Since P av has a 4 non resonant and non degenerate elliptic equilibrium point at 0 and, in view of (3.41), N P av is of order µ(log ɛ 1 ) τ+1, using the Implicit Function Theorem and standard Cauchy estimates 19, for small values of this parameter, for any fixed Ī D ρ/, N also has a µ(log ɛ 1 ) τ+1 close to 0 elliptic equilibrium point, which we call (p 0 (I), q 0 (I)). We can thus assume that (p 0 (I), q 0 (I)) < ɛ 0 /4 for any I and consider a small neighborhood of radius 0 < ɛ < ɛ 0 /4 around (p 0 (I), q 0 (I)). We let φ : (Ĩ, ϕ, p, q) W ṽ, s (Ī, ϕ, p, q) W v, s ṽ := ( ρ/4, ɛ), s := s 0 /1 be the transformation having as generating function ( ) ) s(ĩ, p, ϕ, q) = Ĩ ϕ + p + p 0 (Ĩ) ( q q 0 (Ĩ), which acts as the identity on the Ĩ variables, while shifts the equilibrium point into the origin (and suitably lifts the angles ϕ) accordingly to ) ) Ī = Ĩ, p = p0 (Ĩ) + p, q = q0 (Ĩ) + q, ϕ = ϕ Ĩ ( p + p 0 (Ĩ) ( q q 0 (Ĩ). The transformation φ is close to the identity, since Ī = Ĩ and p p, q q C { ɛ µ(log ɛ 1 ) τ+1 (log ɛ 1 ) τ+1, ϕ ϕ C max, µɛ(log ɛ 1 ) 3τ+ } 3. (3.43) Let us check, for example, the bound on ϕ ϕ (as the other ones are immediate): If D(µ, ɛ, ) := {(Ĩ, p, q) : Ĩ D ρ(ɛ)/4, ( p, q q 0 (Ĩ)) B ɛ}, then, by Cauchy estimates, ( ϕ(ĩ, p, q) ϕ = Ĩ ( p + p 0 (Ĩ)) ( q q 0 (Ĩ))) q=q0+ q sup Ĩ( ( p + p 0 (Ĩ)) ( q q 0 (Ĩ))) 19 See, e.g., [4, Lemma A.1]. D(µ,ɛ,) ( ) C sup D(µ,ɛ,) p + p 0 (Ĩ) ρ(ɛ)/4 C (ɛ + µ(log ɛ 1 ) τ+1 ) ɛ (log ɛ 1 ) τ+1 { ɛ (log ɛ 1 ) τ+1 C max, ( q q 0 (Ĩ) ) µɛ(log ɛ 1 ) 3τ+3 } 3. 13

By construction, the transformation φ puts H into the form H := H φ = H 0 (Ĩ) + µñ(ĩ, p, q) + µɛ5 P ( Ĩ, ϕ, p, q), with Ñ := N φ, P := P φ. Observe that P ṽ, s C and Ñ has a 4 non resonant and non degenerate elliptic equilibrium point into the origin of the ( p, q) coordinates. Step 3 Symplectic diagonalization of the secular system The standard diagonal form (.8) can be achieved by a symplectic diagonalization as in [17]. In fact, by [17], one can find a symplectic map ˆφ : (Î, ˆϕ, ˆp, ˆq) Wˆv,ŝ (Ĩ, ϕ, p, q) W ṽ, s ˆv := ( ρ/8, ɛ/), ŝ := s 0 /4 which acts as the identity on the Î variables, is linear in the variables (ˆp, ˆq) and close to the identity in the sense p ˆp, q ˆq C µɛ(log ɛ 1 ) τ+1, ϕ ˆϕ C µɛ (log ɛ 1 ) 3τ+ 3. (3.44) Such estimates are a consequence of the assumptions on P av (compare the preliminary step above), the estimate Ñ = P av + O( µɛ (log ɛ 1 ) τ+1 ), for which Ñ is O( µɛ (log ɛ 1 ) τ+1 ) close to be diagonal and Cauchy estimates 0. Moreover, one has that 1 ˆN( Î, ˆp, ˆq) := Ñ ˆφ(Î, ˆp, ˆq) = P 0 (Î) + ˆΩ(Î) ˆr + ˆR where ˆΩ Ω, ˆR C µɛ(log ɛ 1 ) τ+1 and ˆR having a zero of order 3 for (ˆp, ˆq) = 0 and that ˆφ transforms H into (3.45) Ĥ := H ˆφ = H 0 (Î) + µ ˆN(Î, ˆp, ˆq) + µɛ5 ˆP ( Î, ˆϕ, ˆp, ˆq), ( ˆP := P ˆφ). Step 4 Birkhoff normal form of the secular part By Proposition. (and subsequent remark) there exists a Birkhoff transformation ˇφ : (Ǐ, ˇϕ, ˇp, ˇq) Wˇv,š (Î, ˆϕ, ˆp, ˆq) Wˆv,ŝ ˇv := ( ρ/16, ɛ/4), š := s 0 /48 which acts as the identity on the Ǐ variables, is close to the identity as (compare (.31): ˆp ˇp, ˆq ˇq C µɛ (log ɛ 1 ) τ+1, ˆϕ ˇϕ C µɛ3 (log ɛ 1 ) 3τ+ 3. (3.46) The previous estimates follow from the fact that, in (.31), the coefficients c α,β of the non normal part of can be upper bounded by m 1 := µ(log ɛ 1 ) τ+1, uniformly in I; α can be taken of order 1 in ɛ and µ; ρ 0 of O( (log ɛ 1 ) ). Furthermore, ˇφ puts ˆN into Birkhoff normal form up to order 4, hence, transforms τ+1 Ĥ into the form ( Ȟ := Ĥ ˇφ := H 0 (Ǐ) + µ P 0 (Ǐ) + ˆΩ(Ǐ) ř + 1 ) ř ˇβ(Ǐ)ř + O( (ˇp, ˇq) 5 ) + µɛ 5 ˆP ˇφ =: H 0 (Ǐ) + µň(ǐ, ř) + µɛ5 ˇP ( Ǐ, ˇϕ, ˇp, ˇq), where Ň(Ǐ, ř) = P 0 (Ǐ) + ˆΩ(Ǐ) ř + 1 ř ˇβ(Ǐ)ř, ř i = ˇp i + ˇq i, ˇP ˇv,š C. 0 The generating function of this transformation is µɛ (log ɛ 1 ) τ+1 close to the generating function Î ϕ + ˆp q of the identity map. Taking the derivatives and using Cauchy estimates, with a loss of analiticity Cɛ in (ˆp, ˆq), and (log(ɛ 1 )) τ+1 in the ˆϕ, we find (3.44). 1ˆr i := (ˆp i + ˆq i )/. 14

Step 5 Global action angle variables for the full system We finally introduce a set of action angle variables using symplectic polar coordinates. Fix the real n dimensional annulus } A(ɛ) := {J R n : č 1 ɛ 5/ < J i < č ɛ, 1 i n (3.47) where č 1 will be fixed later on so as to maximize the measure of preserved tori and ɛ small enough with respect to 1/č 1, while č is a constant depending only on the dimensions. Let D := D A(ɛ), ρ := min{č 1 ɛ 5/ /, ρ/16}, s := ŝ = s 0 /48, (3.48) ( ) where D is the set in (3.37). On D ρ T n s, let φ : (J, ψ) = (J 1, J ), (ψ 1, ψ ) (Ǐ, ˇϕ, ˇp, ˇq) be defined by J 1 = Ǐ, ψ 1 = ˇϕ, ˇp i = J i cos ψ i, ˇq i = J i sin ψ i 1 i n. (3.49) For ɛ small enough, (ˇp, ˇq) B ɛ/4. The transformation φ puts Ȟ into the form H(J, ψ) := Ȟ φ = H 0 (J 1 ) + µ N(J) + µɛ 5 P (J, ψ), N := Ň φ = P 0 (J 1 ) + ˆΩ(J 1 ) J + 1 J ˇβ(J 1 )J, where P := ˆP φ From the above construction there follows that the transformation φ := φ φ ˆφ ˇφ φ : (J, ψ) (I, ϕ, p, q) (3.50) is well defined 3 and verifies I J 1 C µ(log ɛ 1 ) τ ϕ ψ 1 C max { µ(log ɛ 1 ) τ+1, ɛ (log ɛ 1 ) τ+1, } µɛ(log ɛ 1 ) 3τ+3 3 p i p 0 i J i cos ψ i, q i q 0 i J i sin ψ i C max { µ(log ɛ 1 ) τ, µɛ(log ɛ 1 ) τ+1 }. (3.51) Step 6 Construction of the Kolmogorov set and estimate of its measure Fix γ 1 and γ = µ, with γ 1, satisfying µ γ 1 and (1.17). We apply now the two scale KAM Theorem (Proposition.3), with (compare Step 5 above) H = H, h = H 0 (J 1 ) + µ N(J), f = µɛ 5 P (J, ψ), D = D, ρ = ρ and s = s/5, s = 4 s/5. It is easy to check that, for small values of 4 µ(log ɛ 1 ) (τ+1), the frequency map ω µ := (H 0 (J 1 )+µ N(J)) is a diffeomorphism of D ρ, with non singular hessian matrix (H 0 (J 1 )+ µ N(J)). Then, we see that (for a suitable constant C) we can take M, ˆM,, M in Proposition.3 as follows: Then, and (recall also (3.37)) { ˆρ c min M = C, ˆM = Cµ, M = Cµ 1, E = Cµɛ 5, M1 = C, M = Cµ 1. L Cµ 1, K C log (ɛ 5 /γ 1 ) 1 γ 1 (log (ɛ 5 /γ 1 ) 1 ) τ+1, (log (ɛ 5 /γ 1 ) 1 ) τ+1, } (log ɛ 1 ) τ+1, č 1ɛ 5/, ρ 0. (3.5) This is needed to avoid the singularity introduced by the polar coordinates. Notice that J i ɛ compared to (p, q) ɛ. 3 If is chosen as to satisfy the first inequality in (1.17), then, the right hand sides of (3.41), (3.4), (3.43), (3.44), (3.45) and (3.51) can all be bounded by 1/γ. Choosing γ big enough, the quantities involved are small as we please. 4 Such inequality is implied by γ µ log(ɛ 1 ) τ+1 <, which appears in (1.17). 15

Finally, ĉê {ɛ ( ( C max 5 γ1 )) (τ+1) 1 log max{ ɛ 5 γ1, 1 }, ɛ5 (log ɛ 1 ) (τ+1), 1 č 1, ɛ 5 ρ 0 }, (3.53) with a constant C not involving č 1. Then, from (1.17) it follows that { 1 ĉê < C max, 1 γ č, ɛ 5} < 1 (3.54) 1 provided γ, č 1 > C and ɛ 5 < C 1. Finally, since the KAM condition ĉê < 1 is met, Proposition.3 holds in this case. In paricular, for any ω in the set Ω := D γ1,µ,τ ω µ (D), we find a real analytic embedding φ ω : T n T ω := φ ω (T n ) Re (D r ) T n with r Ĉ ˆρ C such that, on T ω, the H flow analytically conugated to ϑ ϑ + ω t. We set T ω := φ(t ω ), where φ is the symplectic transformation defined in (3.50). Using (3.51) and (.35), the parametric equations of T ω may be written as ( I = J1 0 + J 1 with J µ(log ɛ 1 ) τ ) 1 C + µˆρ (p i p 0 i ) + (q i qi 0 ) = Ji 0 + J i with č 1 ɛ 5/ < Ji 0 < č ɛ, and J ( µ (log ɛ 1 ) τ i C + µ ɛ (log ɛ 1 ) 4τ+ ) 4 + ˆρ, where (J1 0, J 0 ) is the ω µ pre image of ω, ˆρ is much smaller than č 1 ɛ 5/ (compare (3.5)); finally, by (3.51), (p 0, q 0 ) C µ(log ɛ 1 ) τ+1. It remains to estimate the measure of the Kolmogorov set K := φ(k) = T ω, where K := T ω. ω Ω ω Ω namely, (1.18). Let Dγ 1,µ,τ := ωµ 1 (D γ1,µ,τ ) D, where D is the set in (3.48), with D as in (3.37) and D γ1,µ,τ is defined ust before.1. Then, by (.34) and because φ is volume preserving, we have meas K = meas K (3.55) ( ) meas ( Re (D r ) T n ) C meas (D \ Dγ 1,µ,τ T n ) + meas ( Re (D r ) \ D T n ). Now, let V := V B n ĉ ɛ, where B n č ɛ denotes the open set { J i < č ɛ }. Observe that D V; define P ɛ := V T n1 {p i + q i < ɛ } (compare (1.5)). Then, by the estimate (3.38) and the definition (3.47) of A(ɛ), meas (D r T n ) meas (D T n ) Similarly, denoting for short B := B n č ɛ, one has that 5 = meas D meas (A(ɛ)) meas (T n ) (1 C Cɛ n/ ) meas (V T n ) = (1 C Cɛ n/ ) meas (P č ɛ ) (3.56) meas ( Re (V r ) \ V ) C meas V, meas (B r \ B) meas (B C \ B) C ɛ meas B meas (V \ D) C( + ɛ n/ ) meas V. 5 Recall that r < C. 16

Thus, meas Re (D r ) \ D T n meas Re (V r ) \ D T n (3.57) meas Re (V r ) \ V T n + meas V \ D T n C( ɛ + + ɛn/ ) meas (V T n ) = C( ɛ + + ɛn/ ) meas (P č ɛ ). Finally, the frequency map ω µ := ( J1 (H 0 + µ N), µ J N) is a diffeomorphism of a ρ-neighborhood of D B n č. Note that ω ɛ µ as a function of J 1 is defined on D ρ and as a function of J is a polynomial; notice also that B n č ɛ is ust the full closed ball around the annulus A(ɛ) (compare (3.47)). Then, the measure of the set D \ Dγ 1,µ,τ does not exceed the measure of the (γ 1, µ ) resonant set for ω µ in the set D B n č ɛ. Such set of resonant points may be estimated by the following technical Lemma, whose proof is deferred to Appendix C. Lemma 3.1 Let n 1, n N, τ > n := n 1 + n, γ 1, γ > 0, 0 < ˆr < 1, D be a compact set. Let ω = (ω 1, ω ) : D B n ˆr Ω R n1 R n be a function which can be extended to a diffeomorphism on an open neighborhood of D B n ˆr, with ω of the form ω (I 1, I ) = ω 0 (I 1 ) + β(i 1 )I where I 1 β(i 1 ) is a (n n ) matrix, non singular on D. Let and denote Then, where R 1 > max D B n ˆr R γ1,γ,τ := ω 1, a > max β, c(n, τ) := 1 D k τ, 0 k Z n { I = (I 1, I ) D } B n ˆr : ω(i) / D γ1,γ,τ. ( ) ( γ ) meas R γ1,γ,τ c 1 γ 1 + c ˆr c 1 := max D B n ˆr c := max D for a suitable integer p depending on D and ω 1. ( ) meas D B n ˆr ( ω) 1 n Rn1 1 1 an c(n, τ) p meas D β 1 n a n 1 c(n, τ) By Lemma 3.1 with γ 1 as in Step 6, D as in (3.37) and ( ˆr = č ɛ, ω = ω resc := J1 (H 0 + µ N), ) J N, γ =, a = R 1 = C, we see that meas (D \ D γ 1,µ,τ T n ) meas (R γ1,,τ T n ) max{c 1, c č }(γ 1 + ɛ ) meas ( D Bče Tn ) (3.58) max{c 1, c č }(γ 1 + ɛ ) meas P č ɛ (3.59) with c i independent of ɛ and µ. Then, in view of (3.55) (3.59), (1.18) follows, with ɛ replaced by č ɛ. The proof of Theorem 1.4 is finished. 17

4 Proof of Theorem 1.3 Since most of the arguments are similar (but simpler) than the ones used in the proof of Theorem 1., we will skip most technical details. We can write P av (p, q; I) = N(I, r) + P av (p, q; I), where and, for a suitable C > 0, N(I, r) := P 0 (I) + Ω i (I)r i + 1 β i (I)r i r 1in 1i,n sup B n ɛ V ρ0 P av Cɛ 5, 0 < ɛ < ɛ 0. (4.1) Step 1 Fix τ > n 1, 0 < ɛ < ɛ 0, and ɛ 6 ( 30 ) τ+1 µ(log ɛ 1 µ < (log(ɛ 1 )) τ+1, ) τ+1 s 0 ɛ 5/. (4.) In place of Proposition.1, we use Lemma A.1 below, where we take r p = r q = ɛ 0, ρ = ρ with ρ as in (3.37), ρ p = ρ q = ɛ 0 /4, σ = s 0 /6 and the remaining quantities as in Step 1 of the proof of Theorem 1.4, namely, l 1 = n 1, l = 0, m = n h = H 0, g 0, f = µp, A = D, r = ρ, as in (3.37) B = B = {0}, s = s 0, α 1 = α = ᾱ = K, where K as in (3.36) and Λ = {0}. With such choice, the check of the non τ resonance assumption (A.1) for ω 0 = H 0 in D ρ is the same as in Step 1 of the proof of Theorem 1.4 and the smallness condition (A.) is implied by (4.). Then, there exists a symplectic transformation φ such that H φ = H 0 + g + + f + as in Lemma A.1. Since g + coincides with µp av, on the domain Wṽ, s (recall the definition of W v,s ust above the (3.40)), where ṽ = ( ρ/, ɛ 0 /) and s = s 0 /3, we find H(Ī, ϕ, p, q) := H φ(ī, ϕ, p, q) (4.3) = H 0 (Ī) + µp av( p, q; Ī) + P (Ī, ϕ, p, q) = H 0 (Ī) + µn(ī, r) + µ P av ( p, q; Ī) + P (Ī, ϕ, p, q). By (A.6) below, the transformation φ satisfies the estimates (3.4). Furthermore, by (4.), the choice of K in (3.36) and (A.5) below, the function P in (4.3) satisfies P v, s Cµ τ+1 µ K max{, µ K τ e Ks 0/6 } Cµɛ 5. (4.4) By such estimate and (4.1), the perturbation P := µ P av + P, on the smaller domain W v, s, where v = ( ρ/, ɛ/), s = s, is bounded by Cµɛ 5. Step and conclusion At this point, we proceed as in Steps 5 and 6 of Theorem 1.4, with W v, s, N and P replacing, respectively, W v, s, N and µɛ 5 P. Now, choose γ big enough (so that the KAM condition (3.53), (3.54) is satisfied), and fix γ, satisfying µ γ 1 and last two lines in (1.17). Then, we can find a set of invariant tori K D r T n1 {č 1 ɛ 5/ < p i + q i < č ɛ } r (P č ɛ ) r (with r < C ) satisfying the measure estimate meas ( P č \ K ) ( ɛ meas P ) ) čɛ r \ K C( + γ1 + ɛ + ɛn/ ) meas P č ). ɛ Finally, taking, as in the proof of Theorem 1., γ 1, as in (1.0), and choosing as the value in the right hand side of (4.), the theorem is proved with K := K Pčɛ and č ɛ replacing ɛ. 18

A Averaging theory (Proposition.1) In this appendix we generalize Proposition A.1 in [4] to a two frequency scale, as needed in Appendix B below. Proposition A.1 in [4] is based on the application of an iterative lemma. The following lemma is the (easy) generalization of the iterative lemma (Lemma A.5) in [4] suitable for our purpose. Lemma A.1 Let 0 < α α 1, l = l 1 + l with l i N and let Λ be a sublattice of Z l. Let g = X g k (u)e ik ϕ k Λ and H(u, ϕ) = h(i) + g(u, ϕ) + f(u, ϕ) be real analytic on W v,s := A r B rp B r q T l s, where A B B (R l 1 R l ) R m R m and v = (r, r p, r q). Let ρ < r/, ρ p < r p/, ρ q < r q/, σ < s/, ν := (ρ, ρ p, ρ q). Suppose that k = (k 1, k ) Z l 1 Z l and that ω(i) k α1 if k 1 0 α if k 1 = 0 k Z l 1 Z l, k / Λ, k K, I A r. (A.1) Assume also that the following smallness condition holds: f v,s < αδ c m, where δ := min{ρσ, ρ pρ q}, c m := e(1 + em)/. (A.) Then,there exists a real analytic symplectic transformation such that with g + g = Π ΛT Kf and f + v ν,s σ φ : (I, ϕ, p, q ) W v ν,s σ (I, ϕ, p, q) W v,σ, (A.3) H + := H φ = h + g + + f +, 1» 1 cm cm α δ f v,s α δ f v,s + {g, H φ } v,s + e Kσ f v,s (A.4). (A.5) Furthermore, the following uniform bounds hold: ff max α 1σ I 1 I 1, α σ I I, α ρ ϕ ϕ, α ρ q p p, α ρ p q q f v,s. (A.6) Proof Assumptions (A.1), (A.) allow to apply the iterative lemma [4, Lemma A.5], with n = l, D = A, E = B, F = B, K = K, α = α, so as to find an analytic transformation Φ := φ verifying (A.3) (A.5) and the bounds on I I, p p, q q into (A.6). In order to prove the bound on I 1 I 1, we recall that such transformation is obtained 6 as the time one map associated to the Hamiltonian flow of Then, one can split H φ as H φ (u, ϕ) = X k K, k Λ f k (u) ik ω(i) eik ϕ. (A.7) with H φ (u, ϕ) = H (1) φ H (1) φ (u, ϕ) := X k K, k Λ,k 1 0 (u, ϕ) + H() φ (u, ϕ), where ϕ = (ϕ1, ϕ) Tl 1 T l f k (u) ik ω(i) eik ϕ, H () φ (u, ϕ :) = X k K, (0,k ) Λ f k (u) ik ω (I) eik ϕ. Using (A.1), one finds that H (1) φ, H() φ verify 6 Compare [4, (A.14)]. H (1) f v,s φ v,s, H () φ α 1 v,s f v,s α. (A.8) 19

Since H () φ is independent on ϕ 1, from the generating equations of φ, equation (A.8) and Cauchy estimates, the bounds for I 1 I 1 (A.6) follow. Finally, when f does not depend on (p, q) one can simply take m = 0. We now may proceed to sketch the proof of the Averaging Theorem in, i.e., Proposition.1. By the same considerations of footnote 0 of [4], we can limit ourselves to the case e Ks/6 3c me/α d. As in [4], we apply once Lemma A.1 with ν = ν 0 := v/8; σ = σ 0 := s/6, thus, constructing a transformation Φ 0 : W1 := W v1, s+s 1 W v, s+s, with v 1 = 3/4v, s 1 = /3s which transforms H = h + f into H 1 = h + g 0 + f 1. Similarly to (A.19) of [4], it follows that f 1 W1 E. By (A.6), one can replace (A.0) of [4] with 6 α 1s I (1) 1 I 1, α s I (1) I, α r q p (1) p, α r p q (1) q, α r ϕ (1) ϕ 8E. Next, one proceeds as in (A1) (A.6) of [4], with W i := W vi, s+s i replacing W i, α replacing α, E i := f i Wi replacing ɛ i, in order to prove (.5). Finally, (.6) follows by the same telescopic argument as in 7 [4], except for taking into account, as done above, the double scale (A.6) of the α i s. B Two scale KAM theory (proof of Proposition.3) The proof of Proposition.3 is based on the following iterative lemma. For the purpose of this proof, we replace τ with τ. Lemma B.1 (Iterative Lemma) Under the same assumptions and notations of Proposition.3, for any 1 N, there exists a domain D R n, two positive numbers ρ, s and a real analytic and symplectic transformation Φ on W := (D ) ρ T n s+s which conugates H 0 := H to H = H 0 Φ = h + f and such that the following holds. Letting, for = 0, D 0 := ω 1 0 (Dγ 1,γ,τ ) D, s 0 := s, ρ 0 := ρ, M 0 := M, ˆM 0 := M, M0 := M, L 0 := L, E 0 := E, K 0 := K, ˆρ 0 := ˆρ, Ê 0 := Ê and, for 1, s := s 1/1, M := M 1, ˆM := ˆM 1, L := L 1, ρ := ˆρ 1/16, then E := E 1L 1M 1, K γ := 6 EL M 1 log 1 s + (B.1) γ 1 ff γ 1 γ ˆρ := min,, ρ (B.) 3M K τ+1 3 ˆM K τ+1, Ê = EL ˆρ (i) D (D 1)ˆρ 1 /16; the frequency map ω := h is a diffeomorphism of (D ) ρ such that ω (D ) = ω 1(D 1); the map î = (î 1, î ) : I D 1 ω 1 ω 1(I) D verifies sup î 1 id 5n M 1 Ê 1 ˆρ 1 5nÊ 1 ˆρ 1, D 1 M sup î id 5n M Ê 1 ˆρ 1 5nÊ 1 ˆρ 1 D 1 M (B.3) L(î id ) 6 nê 1 (B.4) (ii) the perturbation f has sup Fourier norm on W 7 Compare [4]: last formula before Appendix B. f W E (B.5) 0

(iii) the real analytic symplectomorphism Φ is obtained as Φ = Ψ 1 Ψ, where verifies Ψ k : (I k, ϕ k ) W k (I k 1, ϕ k 1 ) W k 1 sup I k 1,1 (I k, ϕ k ) I k,1 3 ˆM (D k ) ρk T n s+s 4 M Êk 1 ˆρ k 1 k sup (D k ) ρk T n s+s k I k 1, (I k, ϕ k ) I k, 3 4 Êk 1 ˆρ k 1 sup ϕ k 1 (I k, ϕ k ) ϕ k 3 (D k ) ρk T n s+s 4 Êk 1s k 1 (B.6) k and the rescaled dimensionless map ˇΨ k id := 1ˆρ,s Ψ k 1 1 ˆρ,s id, has Lipschitz constant on (Ďk) ρk /ρ Ť n ( s+s k )/s L( ˇΨ 4 τ+1 k 1 k id ) 4n Êk 1 (B.7) 6 where id denotes the identity map, 1 d denotes the d d identity matrix, 1 ρ,σ := (ρ 1 1 n, σ 1 1 n), Ďk := ˆρ 1 D k, Ť := R/(π/s)Z; (iv) for any 1, Ê < Ê 1. Remark B.1 Lemma B.1 generalizes the inductive theorem of [3, p.144]. In [3], the quantities E, γ 1, γ are estimated as µɛ 7, ɛ +a, µɛ +a, respectively 8. Indeed, our approach allows to have E µɛ 5 (and hence, essentially, to replace assumption (A) with (A )), taking for γ 1, γ the smallest possible values compatibly with convergence, namely, γ 1 ɛ 5/, γ µɛ 5/ (compare (1.0) above). Such smaller choice of γ 1, γ with respect to [3] is important in order to improve the density of the invariant set as in (1.18). Proof The proof is based on Proposition.1 (with m = 0, l 1 = n 1, l = n, B = B = ). Notice that, by assumption and the choice of D 0, the following inequalities hold, for 1 = 0 ĉê 1 < 1 L 1 max M 1, M 1, ff 1 ˆM 1 (B.8) (B.9) ω(d 1) D γ1,γ,τ. (B.10) Let us assume (inductively) that, when 1 0 (B.8), (B.9) and (B.10) hold and that,for 1 1, the Lemma holds with Φ 1 = Ψ 1 Ψ 1. In order to describe the th step, for simplicity, we write ρ, ˆρ, s, M, ˆM, M, L, K, E, Ê, D, H, h, f, ω = (ω 1, ω ) for ρ 1, ˆρ 1, s 1,etc., and ρ +, ˆρ +, s +, etc., for ρ, ˆρ, s, etc., (the corresponding initial quantities will be called, as in the statement, ρ 0, ˆρ 0, s 0, etc., ). By (B.10) and the choice of ˆρ (equation (B.)), when 0 < k K and I Dˆρ the following non resonance inequalities hold (which are checked as in (3.39) above) 8 γ 1 >< =: α1 when k1 0 ; 3Kτ ω 1(I) k 1 + ω (I) k (B.11) >: γ =: α when k1 = 0 & k 0. 3Kτ The inequality Ks 6 is trivial by definition of K (see (B.1)) and also the smallness condition (.3) is easily met, since in this case d = ˆρs and hence 8 With a as in (1.7) above. 7 c 0 K α ˆρ f D ˆρ T n s+s 6 c 0 E ˆM ˆρ ĉe < 1 1

having used α ˆMK ˆρ, L ˆM 1, 6 c 0 < ĉ and (B.8). Thus, by Proposition.1 (with Λ = {0}, h = h,, g 0, f = f, Wˆρ,s = Dˆρ T n s+s), H may be conugated to where, by (.5) and the choice of K, f + D ˆρ/ T n s+s/6 H + := H Ψ + = h + + f + The conugation is realized by an analytic transformation e Ks/6 E ELM Ψ + : (I +, ϕ +) Dˆρ/ T n s+s/6 (I, ϕ) Dˆρ T n s+s. γ 1 E = E +. (B.1) Furthermore, in view of (.6) (with α 1, α as in (B.11)), of α 1 LK ˆρM/ ˆM, of α LK ˆρ, of Ks 6 and of L M 1, ˆM 1, we have sup I 1(I +, ϕ +) I +,1 3 ˆM 4 M Ê ˆρ. (B.13) D ˆρ/ T n s+s/6 Similarly, sup D ˆρ/ T n s+s/6 I (I +, ϕ +) I +, 3 Ê ˆρ, sup ϕ(i +, ϕ +) ϕ + 3 4 D ˆρ/ T n s+s/6 4 Ês. (B.14) Lemma B. The new frequency map ω + := h + is inective on Dˆρ/8 and maps Dˆρ/16 over ω(d). The map î + = (î +1, î +) := ω 1 + ω D which assigns to a point I0 D the ω+ preimage of ω(i0) in Dˆρ/16 satisfies sup î +1 id 5n M 1E 5n ME D ˆρ ˆρ, sup î + id 5n M E 5n ME D ˆρ ˆρ L(î + id ) 6 n ME ˆρ. (B.15) Finally, the Jacobian matrix U + := h + is non singular on Dˆρ/8 and the following bounds hold, where U 1 T+1 + =: T + M + := M M + := M. sup U +, ˆM+ := ˆM sup Û+, I D ˆρ/8 I D ˆρ/8 sup U 1 +, Mi+ := M i sup T i+, i = 1,. I D ˆρ/8 I D ˆρ/8 (B.16) Postponing for the moment the proof of this Lemma, we let ρ + := ˆρ/16, s + := s/1, and D + := î +(D). By Lemma B., D + is a subset of Dˆρ/16 and hence (D +) ρ+ Dˆρ/8. (B.17) At this point, (B.5) follows from (B.1) and (B.6) from (B.13), (B.14). We are now ready to prove that Ê+ = Ê. Since E + L + ˆρ + s + = s 1 E+L +M 1 + and x + := = x γ 1 8 ELM 1 where x := (B.18) γ 1 we have Thus, K + = 6 1 3 log 36 log log x + = 1 log x = 4K < 4K. s + s s s ( γ 1 ˆρ + = min 3M +K τ+1, + γ 3 ˆM +K τ+1 +, ρ + = ˆρ 16 ) ˆρ (4) τ+1 (B.19)