Math 259: Introduction to Analytic Number Theory Exponential sums II: the Kuzmin and Montgomery-Vaughan estimates

Math 59: Introduction to Analytic Number Theory Exponential sums II: the Kuzmin and Montgomery-Vaughan estimates [Blurb on algebraic vs. analytical bounds on exponential sums goes here] While proving that an arithmetic progression with irrational step size is equidistributed mod, we encountered the estimate N e(cn) e(c) = / sin πc c, n= where c is the distance from c to the nearest integer. Kuzmin (97) obtained a much more general estimate of this kind: Proposition. Let c n (0 n N) be a sequence of real numbers whose sequence of differences δ n := c n c n ( n N) is monotonic and contained in [k + λ, k + λ] for some k Z and λ > 0. Then N e(c n ) cot πλ λ. Proof : Let ζ n = Note that the ζ n are collinear: e(δ n ) = e(c n ) e(c n ) e(c n ). ζ n = ( + i cot πδ n )/; since the sequence {δ n } is monotonic and contained in (k, k + ), the ζ n are positioned consecutively on the vertical line Re(ζ) = /. Now our exponential sum is N N e(c n ) = e(c N ) + (e(c n ) e(c n ))ζ n Thus N e(c n ) ζ + n= = ( ζ N )e(c N ) + ζ e(c 0 ) + N n= N n= e(c n )(ζ n+ ζ n ). ζ n+ ζ n + ζ N = ζ + ζ N ζ + ζ N, where in the last step we used the monotonicity of Im(ζ n ) and the fact that Re(ζ n ) = /. The conclusion of the proof, ζ + ζ N ζ + ζ N sin πλ + πλ = cot tan πλ,

is an exercise in trigonometry. For instance, for t/π < N < N we have N n=n n it N /t, since we are dealing with c n = (t log n)/π and thus δ n t/nπ. By partial summation it follows that and thus N n=n n / it t ζ(/ + it) = N N n 3/ n dn t N /, N n / it + O(), n= uniformly for all N, t with t /π < N t. With some more work, we can (and soon will) push the upper limit of the sum further down, but not (yet?) all the way to t ɛ ; as n decreases, the phase e((t log n)/π) varies more erratically, making the sum harder to control. Still, if we sum random complex numbers of norm c n, the variance of the sum is n c n, so we expect the sum to grow as the square root of that, which for ζ(/ + it) would make it log / t on average. We shall prove this as an application of one of a series of general mean-square results of this kind, in which the summands are not independent variables but complex exponentials with different frequencies: f(t) = c µ e(µt) for some finite set A R and coefficients c µ C. For example, to estimate T ζ(σ + it) dt for some nonnegative T, T, we shall take A = {(π) log n : πn < T } and c µ = n σ for each µ = (π) log n A. To begin with, if we fix A and c µ then clearly T f(t) dt = (T T ) c µ + O(). How does the O() depend on A, c µ, T, T? Consider first the special case that A is contained in an arithmetic progression {µ 0 +nδ : n Z} with common difference δ > 0. Then e( µ 0 t)f(t) is a periodic function of period δ, and T f(t) dt = (T T ) c µ holds exactly if T T δ Z. It follows that for any T, T we have f(t) dt (T T ) c µ < δ c µ. () T

Remarkably the same inequality can be proved under the much weaker hypothesis that µ ν δ for all distinct µ, ν A. We follow [Vaaler 985], who attributes the argument to Selberg in 974. Lemma. Let χ I be the characteristic function of the interval I = [T, T ]. Suppose β, β + : R R are functions such that: i) β (z) χ I (z) β + (z) for all real z; ii) β (z) dz and β +(z) dz converge, say to B and B + respectively; iii) The Fourier transforms ˆβ (r) = β (z) e(rz) dz, ˆβ+ (r) = β + (z) e(rz) dz vanish for all real r with r δ. Then for every finite set A R such that µ ν δ for all distinct µ, ν A, and any c µ C, we have B c µ where f(t) = c µe(µt). Proof : By (i) we have f(t) β (t) dt T T f(t) dt B + c µ f(t) dt f(t) β + (t) dt. We expand f(t) as µ,ν A c µ c ν e((µ ν)t), and find that the lower and upper bounds are µ,ν A c µ c ν ˆβ± (µ ν). By (ii), the main terms (with µ = ν) sum to B ± c µ ; by (iii), the cross terms (with µ ν) vanish. It is not at all obvious that any functions β ± can be found that satisfy all three conditions of the Lemma. Note that we must have β ± (z) = δ ˆβ δ ± (r) e( rz) dr by condition (iii) and the inversion formula for Fourier transforms, so in particular the β ± (z) must extend to entire functions of z = x + iy with β ± (x + iy) exp πδ y. We construct suitable β ± as follows. The Beurling function B(z) is defined by ( ) [ sin πz B(z) := π z + (n z) m= ] (m + z). () This is an entire function of z, because the double zeros of sin πz at z Z cancel the double poles of /(n z) and /(m + z). Beurling [938] proved: Proposition. i) 0 B(z) sgn(z) < /(πz) for all z R, with B(z) = sgn(z) if and only if z is a nonzero integer. ii) (B(z) sgn(z)) dz =. 3

iii) For z = x + iy C we have B(z) sgn(x) ( + z ) exp π y ; in particular, B(z) exp π Im(z) for all z C. Here sgn(z) is the sign (a.k.a. signum) of the real number z, equal to,, or 0 according as z is positive, negative, or zero. Assuming this Proposition, consider the functions β ± defined by β (z) = [ B ( δ(t z) ) + B ( δ(z T ) )], β + (z) = + [ B ( δ(z T ) ) + B ( δ(t z) )]. By (i), together with the observation that χ I (z) = ( sgn(z T )+sgn(t z) ) / for z T i, we have β (z) χ I (z) β + (z) for all real z. The same observation together with (ii) yields β ± (z) dz = ± δ + χ I (z) dz = T T ± δ. Finally, by (iii) the β ± (z) are analytic with β ± (z) z exp πδ Im(z). Thus for r δ we can prove β ±(z) e(rz) dz = 0 by contour integration, moving the path of integration up if r δ and down if r δ. This together with the preceding Lemma establishes the inequality () whenever µ ν δ for all distinct µ, ν A. It remains to prove the Proposition on Beurling s function. Proof : We use the well-known partial-fraction decomposition (π/ sin πz) = n= /(z n), which we shall write as ( π ) = sin πz (n z) + m= i) We have B(0) = > sgn(0). For z > 0 we use (3) to write ( ) [ ] sin πz B(z) = π z (m + z). m= Since /(t + z) is a decreasing function of t on [0, ), we have dt (t + z) < m= (m + z) < 0 (m + z). (3) dt (t + z). Thus m= (/(z + ), /z), so B(z) with equality if and only if z Z, and B(z) (sin πz/πz) /(πz). Likewise, for z < 0 we use ( ) [ ] sin πz B(z) + = π z + (n z). 4

Since /(t z) is a decreasing function of t on [0, ), we have (n z) > dt (t z) = z, so B(z), again with equality if and only if z Z. On the other hand, (n z) = z + (n z) < z + dt (t z) = z + z, n= 0 from which B(z) + (sin πz/πz) /(πz). We have proved the claimed inequality whether z is zero, positive, or negative. ii) By part (i), the integral converges, and thus equals (B(z)+B( z)) dz. But (B(z) + B( z))/ is simply ((sin πz)/πz), and it is well-known that ( ) sin πz dz =. πz (For instance, one may calculate that ((sin πz)/πz) = ( r ) e(rz) dz, and then use Fourier inversion.) Hence (B(z) sgn(z)) dz = as claimed. iii) We may assume z >. Again we use our formulas for B(z) or B(z) + according as x 0 or x 0. In the former case, z m= (m + z) = m= m= m m ( 0 (t + z) (m + z) m + z 3 z. ) dt Likewise if x 0 we have z + (n z) /z. Since sin πz exp π y, the claimed inequality follows. The inequality () is not quite enough for us to prove directly that T 0 ζ( + it) dt T log T, (4) because so far we must approximate ζ( +it) by a partial sum of length N t to assure an O() error, and thus must take δ /N /T and get an error term in () proportional to the main term T log T. We can nevertheless obtain (4) in two ways, by improving our estimates on exponential sums either individually or in mean square. (See also the Exercises.) For now we continue with the mean-square approach. Again we set f(t) = c µe(µt), and integrate f(t) termwise. This time we write the result as T f(t) dt (T T ) c µ = Q A ( c ) Q A ( c ), 5

where Q A is the sesquilinear form on C A defined by Q A ( x) = x µ x ν πi µ ν µ,ν A µ ν and c j C A (j =, ) are the vectors with µ coordinate c µ e(µt j ). The termwise estimate Q A ( x) π µ>ν x µx ν /(µ ν) is already sufficient to prove T T 0 ζ(/ + it) dt log T. But remarkably a tighter estimate holds in this general setting. Let δ(µ) = min ν µ, ν the minimum taken over all ν A other than µ itself. We shall show: Theorem (Montgomery-Vaughan Hilbert Inequality). For any finite set A R and any c C A we have Q A ( c) c µ δ(µ), and thus T c µ e(µt) dt = [ T T + θ ] c µ δ(µ) with θ. Why Hilbert Inequality and not simply Inequality? Because this is a grand generalization of the original Hilbert inequality, which is the special case A = {µ Z : µ < M}. In that case our function f(t) is Z-periodic, and as Schur observed the inequality Q A ( c) < (/) µ c µ follows from the integral formula Q A ( c) = i 0 (t ) f(t) dt (though as we ve seen in the periodic case the resulting estimate on T T f(t) dt is even easier than the upper bound on Q A ( c) ). The Montgomery-Vaughan inequality does not have as precise an error bound as (), but it has the advantage that the coefficient of c µ is smaller when the distance from µ to the rest of A greatly exceeds δ = min δ(µ). For example, the formula (4) for the second moment of ζ( + it) follows quickly from Montgomery-Vaughan: take A = {log n/π : n =,, 3,..., N} to find that N N c n n it dt = (T T + O(n)) c n T n= n= for any T, T, c n ; then choose (T, T ) = ( T, T/) and c n = n / to find T N T/ n= n / it dt = T log N + O(T + N), 6

and conclude that T T/ from which (4) follows. ζ(/ + it) dt = T log T + O(T log T ), Proof of the Montgomery-Vaughan Hilbert inequality: Consider C A as a finitedimensional complex Hilbert space with inner product c, c := c µ c µ/δ(µ). Then Q A ( x) = x, L x where L is the Hermitian operator taking x to the vector with µ coordinate (πi) δ(µ) ν µ x ν/(µ ν), and we want to show that c, L c c, c for all c C A. But this is equivalent to the condition that L have norm O() as an operator on that Hilbert space, and since the operator is Hermitian it is enough to check that [ Q A ( c) =] c, L c holds when c is a normalized eigenvector. Thus it is enough to prove that Q A ( c) for all A, c such that c µ /δ(µ) = and there exists some λ R such that δ(µ) ν µ c ν /(µ ν) = iλc µ for each µ A, in which case λ = πq A ( c). Now for any c we have πq( c) = c µ c ν ( c ν )( c µ ). µ ν µ ν ν ν ν µ ν µ ν By assumption ν c ν / =. For the other factor, we expand c µ = c µ µ ν µ ν + c µ c µ (µ ν)(µ ν). µ ν µ ν µ µ The single sum contributes c µ (µ ν) µ ν µ to ν µ ν c µ/(µ ν) ; let T µ be the inner sum ν µ /(µ ν), so the above contribution is µ c µ T µ. The double sum contributes µ µ c µ c µ ν µ,µ 7 (µ ν)(µ ν).

The key trick is now to use the partial fraction decomposition ( (µ ν)(µ ν) = µ µ µ ν ) µ ν to rewrite this last triple sum as c µ c [ ( µ µ µ µ µ ν µ,µ (µ ν) (µ ν) The point is that the first part of the inner sum is almost independent of µ, while the second half is almost independent of µ : the other µ enters only as a single excluded ν. That is, the triple sum is where c µ c µ µ µ µ µ [( S(µ ) δ(µ ) µ µ S(µ) := ν µ )]. ) ( S(µ ) δ(µ )] ) µ µ µ ν. And now we get to use the eigenvalue hypothesis to show that the S(µ j ) terms cancel each other. Indeed we have c µ c µ S(µ ) = c µ S(µ ) µ µ µ µ µ µ µ and the inner sum is just iλ c µ /δ(µ ), so c µ c µ S(µ ) = iλ µ µ µ µ µ The same computation shows that c µ c µ S(µ ) = iλ µ µ µ µ µ µ µ c µ S(µ) c µ δ(µ). S(µ) c µ δ(µ), so the S(µ j ) terms indeed drop out! Collecting the surviving terms, we are thus left with πq( c) c µ T µ + δ(µ ) + δ(µ ) c µ c µ (µ µ ). (5) µ µ By now all the coefficients are positive, so we will have no further magic cancellations and will have to just estimate how big things can get. We ll need some lemmas (which are the only place we actually use the definition of δ(µ)!): first, for each k =, 3,..., µ A ν µ (µ ν) k k δ(µ) k ; (6) 8

second, µ, µ A ν µ,µ Now the first sum in (5) is O() because (µ ν) (µ ν) [δ(µ ) ] + [δ(µ ) ] (µ µ ). (7) T µ = ν µ (µ ν) δ(µ) by the case k = of (6). The second sum will be bounded by Cauchy-Schwarz. That sum is bounded by twice B := δ(µ ) c µ c µ (µ µ ) = δ(µ) c µ c ν (µ ν). µ µ µ ν Since µ c µ /δ(µ) =, we have B ν ( c µ δ(µ) ). (µ ν) µ ν Expanding and switching s we rewrite this as B ( c µ c µ δ(µ )δ(µ ) µ,µ ν µ,µ ) (µ ν) (µ ν). When µ = µ, the inner sum is δ(µ) 3 (by (6) with k = 4), so the contribution of those terms is µ c µ /δ(µ) =. When µ µ we apply (7), and the resulting estimate on the sum of the cross-terms is twice the double sum defining B! So, we ve shown (modulo the proofs of (6, 7)) that B + B. Thus B and we re finally done. Exercises On the Kuzmin inequality:. Prove (4) in yet another way as follows. Write T 0 N n / it N dt = T n + n= n= 0<n<n N (nn ) / Im (n /n) it log(n /n). For each j =,, 3,... use Kuzmin to obtain nontrivial bounds on (nn ) / (n /n) it / log(n /n). n =n+j (This is closely related to the van der Corput bounds that will be our next topic.) 9

On the Beurling function:. Show that B(z) = B(z ) π sin πz/(z 3 z ). Explain how this can be used to efficiently compute B(z) to high accuracy. (There are at least two approaches, one of which works also for large and/or complex z.) 3. Prove that the constant δ in () is best possible. Use this to show that the Beurling function minimizes (f(z) sgn(z)) dz over all entire functions f satisfying f(z) sgn(z) for z R and f(z) exp π Im(z) for all z C. Beurling showed that in fact B(z) is the unique minimizing function. The same argument shows that if T T is a positive integer then our β ± are optimal; but they are not unique: see [GV 98, p.89] for Selberg s description of all the optimal β ±. When T T / Z, the best β ± are slightly better than those constructed from Beurling s function; Logan [977] found the optimal β ± in this case and proved their uniqueness. 4. (A further application of β + to mean-square bounds on exponential sums; look up Large Sieve in [Selberg 969] for the context) i) Suppose T, T Zδ. Prove that there exists an entire function f such that β + (z) = f(z) for all z R and f(z) z exp πδ Im(z) for all z C, and thus that f R is an L function with ˆf supported on r δ/. [Hint: a polynomial P R[x] is nonnegative for all real x if and only if P = Q for some Q C[x]. According to [Vaaler 985], the result holds even without the hypothesis that T T Zδ, using a theorem of Fejér.] ii) Now let S(x) be a trigonometric polynomial of the form T n=t c n e(nx) for some complex numbers c n (T n T ), and set S (x) = T T f(n) c n e(nx). Then S is the convolution of S with ˆf, so ( ) ( δ/ ) δ/ S(x) ˆf(u) du δ/ = (T T + δ ) δ/ δ/ S (x + u) du δ/ S (x + u) du. Conclude that if (δ and) A R/Z is any finite set such that x x δ for all distinct x, x A then T S(x) < (T T + δ ) c n. x A n=t Explain why this is elementary when A x 0 + (ZR /Z) for some integer R. On the generalized Hilbert inequality: 5. Prove that the constant π in the original Hilbert inequality is best possible, and show that it holds even if c µ is allowed to be nonzero for every integer µ (this is in fact what Hilbert originally proved). 0

6. More generally, prove that the norm of Q A relative to the standard inner product ( c, c ) = c µ c µ is less than (δ). 7. Deduce the second-moment estimate (), albeit with a slightly worse error bound than we proved using Montgomery-Vaughan, from the generalized Hilbert inequality (), as follows: write N n= n / it = f (t) + f (t), where f (t) = A n= n / it and f (t) = N n=a+ n / it ; use () to estimate T T/ f (t) dt and T T/ f (t) dt; and then use f + f = f + O( f ) (triangle inequality in L (T/, T )). On the Montgomery-Vaughan inequality: 8. Complete the proof by verifying the inequalities (6,7). 9. Let χ be a character (primitive or not) mod q. Obtain an asymptotic formula for T 0 L(/ + it, χ) dt. How does the error term depend on q? (It is conjectured that L(/ + it, χ) ɛ (q t ) ɛ ; naturally this problem is still wide open: it has the Lindelöf conjecture as a special case.) References [Beurling 938] Beurling, A.: Sur les intégrales de Fourier absolument convergentes et leur application à fonctionelle, Neuvième Congrès des Mathématiciens Scandinaves, Helsinfgors, 938. [GV 98] Graham, S.W., Vaaler, J.D.: A class of extremal functions for the Fourier transform, Trans. Amer. Math. Soc. 65 (98), 83 30. [HLP] Hardy, H.G., Littlewood, J.E., Pólya, G.: Inequalities. Cambridge Univ. Press, 967 (nd ed.). [Logan 977] Logan, B.F.: Bandlimited functions bounded below over an interval, Notices Amer. Math. Soc. 4 (977), A-33. [MV 974] Montgomery, H.L., Vaughan, R.C.: Hilbert s Inequality, J. London Math. Soc. () 8 (974), 73 8. [Vaaler 985] Vaaler, J.D.: Some extremal functions in Fourier analysis, Bull. Amer. Math. Soc. (N.S.) (985) #, 83 6.