Tight Bound for the Density of Sequence of Integers the Sum of No Two of which is a Perfect Square

DIMACS Technical Report 2000-39 April 200 Tight Bound for the Density of Sequence of Integers the Sum of No Two of which is a erfect Square by Ayman Khalfalah Email: akhalfal@cs.rutgers.edu Address: Dept. of Computer Science Rutgers University New Brunswick, New Jersey 08903 Sachin Lodha Email: lodha@cs.rutgers.edu Address: Dept. of Computer Science Rutgers University New Bruswick, New Jersey 08903 Endre Szemerédi Email: szemered@cs.rutgers.edu Address: Dept. of Computer Science Rutgers University New Brunswick, New Jersey 08903 This work was supported by DIMACS which was funded by the NSF under grant STC 9-9999. DIMACS is a partnership of Rutgers University, rinceton University, AT&T Labs-Research, Bell Labs, NEC Research Institute and Telcordia Technologies (formerly Bellcore). DIMACS was founded as an NSF Science and Technology Center, and also receives support from the New Jersey Commission on Science and Technology.

ABSTRACT. Erdős and D. Silverman [EG-80] proposed the problem of determining the maximal density attainable by a set S of positive integers having the property that no two distinct elements of S sum up to a perfect square. J.. Massias exhibited such a set consisting of all x (mod 4) with x 4, 26, 30 (mod 32) in [M]. In [LOS-82], J. C. Lagarias, A. M. Odylzko and J. B. Shearer showed that for any positive integer n, one cannot find more than n residue classes (mod n) such that the sum of any two is never congruent to a 32 square (mod n), thus essentially proving that the Massias set has the best possible density. They [LOS-83] also proved that the density of such a set S is never more than 0.475 when we allow general sequences. We improve on the lower bound for general sequences, essentially proving that it is not 0.475, but arbitrarily close to, the same as that for sequences made up of only arithmetic 32 progressions.

Introduction. Erdős and D. Silverman [EG-80] proposed the problem of determining the maximal density attainable by a set S = {s i } of positive integers having the following property : roperty NS : i j, s i + s j is not a perfect square. J.. Massias exhibited such a set consisting of all x (mod 4) with x 4, 26, 30 (mod 32) in [M]. Its density is 32. In [LOS-82], J. C. Lagarias, A. M. Odylzko and J. B. Shearer proved following theorem Theorem Let S be a union of arithmetic progressions (mod M) having the property NS. Then the density d(s) with equality possible if and only if 32 M. For all other M, 32 d(s). 3 Therefore, they essentially showed that the Massias set has the best possible density if S were to be a union of arithmetic progressions. The same authors [LOS-83] also proved Theorem 2 Let S denote a finite set with all elements N which has property NS, and let d(n) = max S Then, there exists an absolute constant N 0 such that for all N > N 0, d(n) 0.475. It is an improvement over trivial upper bound of 0.5 on d(n), and yet there is a wide gap between the results of theorem and theorem 2. One must also note that the behavior of sets S having the property NS is quite different from those sets S having following property : S N roperty DS : i j, s i s j is not a perfect square. In [S-78], A. Sárkőzy proved that any set S having property DS has density 0, to be precise, it has at most [x(loglog x) 2/3 /(log x) /3 ] elements x. In this paper, we strengthen theorem 2 and show that the bound in theorem applies in general case too. Theorem 3 Let S denote a finite set with all elements N which has property NS, and let d(n) = max S Then, for any positive real number δ, there exists an absolute constant N 0 (δ) such that for all N > N 0 (δ), d(n) < 32 + δ. S N

2 2 Outline of roof We sketch outline of our proof in this section. We are using notation from section 3. Given a subset S of [N] of density +δ, the number of solutions to the equation x+y = z2 32 where x, y S is equal to t=0 f SQ ( t ) f S( t ) f S( t ). () Assuming that there is no solution to this equation then the sum in equation () must be 0. Let M be a highly composite number. We shift S by jm, 0 j N ǫ and then count the number of solutions to the equation x + y = z 2 where x S and y S + jm for each j. We show that it s almost the same as in equation (). The average error is O( N N ), where is the largest prime divisor of M. Then we partition [N] into M residue classes mod M and observe how well S gets partitioned into these different pieces of [N]. If S is well-distributed in these residue classes, then the average number of solutions to the equation x + y = z 2 where x S and y S + jm (averaging over 0 j N ǫ ) is Ω( N N ). But this is not good enough to get M 2 contradiction! In fact, we will show that we can combinatorially get a highly composite M such that there exist quite a few pairs of well distributed dense residue classes modulo M which add up to a quadratic residue modulo M, shifting gives at most O( N N ) average error in analytical counting and combinatorial counting estimates the average number of solutions to be Ω( N N log 2 ). The last two statements give us a contradiction. 3 Notation Let N be a sufficiently large positive integer and let S be a subset of [N] of density 32 + δ, δ being a positive constant. Let l = 56 δ 3. For i l, let us define primes p i in following way p = 7, p i+ 2 6pi. Using these primes, we define l numbers, q through q l q 0 =, + = p prime,pp i+,p αp p 2 i+ p αp.

3 Let σ = 32 +δ ( 32 +δ)2 l. Note that σ 4 l = δ3 224. Let ǫ i,j (S) be the density of set S in residue class j modulo. That is ǫ i,j (S) = {s S s j mod } n/. Since we will be working with some fixed subset S of [N], we will drop S from above notation and just use ǫ i,j throughout this paper. We use the abbreviations e(α) = e 2πiα. We denote the trigonometrical sum over a set X of integers as f X (α) = x X e(αx). Let SQ be the set of all perfect squares which are less than or equal to. Note that f SQ (α) = s SQ e(αs) = x=0 e(αx 2 ). For sake of simplification of writing, and without loss of generality, we may assume that is an integer. Notice that given any subset A and B of [N], the number of triplets (x, y, z) such that x A, y B and x + y = z 2 is 4 Definitions Definition t=0 f SQ ( t ) f A( t ) f B( t ). α i = qi ǫ i,j 2. Definition 2 The residue class j is full if ǫ i,j = o(). Definition 3 The residue class j is bad if {k 0 k < + and ǫ i,j ǫ i+,j+kqi δ/4} + / 8. Definition 4 The residue class j is good if it is not bad.

4 5 Lemmas Lemma + 6 p i+ and 2 p i+. roof : It could be easily done using induction and left as an exercise to the reader. Lemma 2 α i+ α i. roof : α i+ α i = qi qi+ / k,k =0 (ǫ i+,j+kqi ǫ i+,j+k ) 2 0. qi+/q 2 i Lemma 3 i ( 32 + δ)2 α i 32 + δ. roof : Using Cauchy-Schwartz inequality, we get α i = qi 2 ǫ i,j ( qi ǫ i,j ) 2 q 2 i = ( 32 + δ)2. Since 0 ǫ i,j, we get α i = qi ǫ i,j 2 qi ǫ i,j = 32 + δ. Notice that α i = + δ will mean that ( + δ)q 32 32 i residue classes modulo are full. Lemma 4 (Improved Cauchy-Schwartz Inequality) If for the integers 0 < m < n, then n k= mk= x k m = x k 2 ( n k= x k ) 2 n nk= x k + γ, n + γ2 mn n m roof :

5 n 2 x k k= = m x 2 k + n 2 x k k= k=m+ ( m k= x k ) 2 + ( n k=m+ x k ) 2 m n m = ( m( n k= x k ) 2 + 2mγ( nk= x k ) n 2 n ( (n m)( n k= x k ) 2 2mγ( nk= x k ) n 2 n = ( n k= x k ) 2 n + γ2 mn n m. + mγ 2 ) + + m2 γ 2 n m ) Lemma 5 If α i+ α i < σ, then the number of bad residue classes mod is less than δ 2. roof : α i+ α i = q i ( + + / k =0 ǫ 2 i+,k +j ( +/ k =0 ǫ i+,j+k ) 2 + / ) Let number of bad residue classes mod be B. Then using lemma 4 with n = + m = n 8 and γ = δ 4, we get that α i+ α i + B qi+δ 2 2 = Bδ2 2 Since this difference was less than σ, we get, B < 2σ δ 2 < δ 2. Lemma 6 If α i+ α i < σ, then there exists a pair of good residue classes, say a b, modulo such that. ǫ i,a δ 2, 2. ǫ i,b δ 2, 3. a + b is a quadratic residue modulo.

6 roof : Since α i+ α i < σ, lemma 4 says that the number of bad residue classes modulo is less than δ 2. So the number of good residue classes is at least ( δ 2 ). Also note that k is good ǫ i,k ( 32 + δ) δ 2 = 32 + δ 2. G = {k k is good and ǫ i,k δ 2 }. If G > /2, then we have the required pair a and b simply by pigeonhole principle. We will show now that G >. Therefore, such a pair must exist in the light of theorem. 32 If G, then 32 ǫ i,k G + ( G) δ 2 32 + 2δ 64. k is good contradicting lower bound we got above. Lemma 7 Let z be a quadratic residue modulo q, n be a sufficiently large positive integer. Then the number of perfect squares in any interval of length H of [n] z mod s at least H 4 n. roof : Let z k 2 mod q. Suppose the interval is [r, r + q, r + 2q,..., r + (H )q], where 0 r n (H )q. So we want to count the number of integral l such that r k l q follows that the number of such l that satisfy these inequalities is at least ( r + (H )q k) ( r k) 2 q r+(h )q k (H )q ( r + (H )q + r)q 2 H 4 n. q. It Lemma 8 (Shifting Lemma) Let S [N] be such that the number of solutions to the equation x+y = z 2 where x, y S is equal to 0. Let be a big prime, C an integer constant and let M = C p αp. p prime,p,p αp 2 Then the number of solutions where x S and y S + jm (for 0 j N 2ǫ, ǫ > 0) is at most O( N N ).

7 roof : Given a Set S [N], the number of solutions to the equation x + x 2 = z 2 where x, x 2 S is given by equation () t=0 f SQ ( t )f S( t )f S( t ) Assuming that there is no solution to this equation then the sum in equation () must be 0. We would like to estimate t a(t) ), by approximating by. Using Farey fraction with denominator Q N ǫ and letting r to be the error in the approximation we get. t r = a(t) N ǫ ) = e(x 2 t x=0 ) = x=0 e(x 2 ( a(t) + r)) Now, replacing x by j + k, where j runs from 0 to and k runs from 0 to. We get x 2 = j 2 + 2kj + k 2 2 and, ) = k=0 e(j 2 + 2kj + k 2 2 )( a(t) + r) = = k=0 k=0 e(k 2 a(t) + k 2 2 r)e(2kja(t))e(j 2a(t) )e((j2 + 2kj)r) e(k 2 a(t))e(k 2 2 r) Now using the fact that e(l) = when l is integer, we get ) = k=0 Assuming that, we get e(k 2 2 r) e(2kja(t))e(j 2a(t) )e((j2 + 2kj)r) e(j 2a(t) )e((j2 + 2kj)r) (2) (j 2 + 2kj)r 3N ( 0.5+ǫ) and e(j 2 + 2kj)r) + 3N ( 0.5+ǫ).

8 Therefore, e(j 2a(t) )e((j2 + 2kj)r) e(j 2a(t) ) + 3N ( 0.5+ǫ) + 3N ( 0.5+ǫ) (3) In equation (3) we used the fact that the maximum value to e(j 2 a(t) ) is, so we divided the sum into two parts, the first part is the basic part and the second is the error, to find the maximum value of the error we multiplied it by instead of e(j 2 a(t) ). Using Gauss sum, we get e(j 2 a(t) ) =. Now using equation (2) and equation (3), we get equation (4), which is true for any (notice that for we get ) is o( ) using lemma 2.4 on page 2 of [V-97]) ) k=0 e(k 2 2 r)( + 3N ( 0.5+ǫ) ) ( + 3N ( 0.5+ǫ) ) k=0 2 (4) We can divide the sum in equation () into two parts based on the value of compared to, = )f S( t )f S( t ) 2 = < )f S( t )f S( t ) We want to upper bound 2, so using equation (4) and arseval identity, we get 2 = N t N N2 N )f S( t )f S( t ) f S ( t ) 2

9 Now since + 2 = 0, by the assumption that there is no solution, we conclude that N (5) We want to show the number of solutions to x +x 2 = z 2 where x S, and, x 2 S+jM, such that j N 2ǫ is close to the number of solutions to our problem. It is clear, to do that, we need only consider how much and 2 will change if we replaced equation () by equation (6). t=0 f SQ ( t )f S( t )f S+jM( t ). (6) From now on we will call and 2 related to this equation by j and 2 j. For 2 j, we will use the fact from equation (4) with arseval identity and Cauchy-Schwartz inequality to get, 2 2 j N f S ( t )f S+jM( t ) 2 fs ( t t N ) 2 fs+jm( ) 2 N (7) For j, we will use the fact that M if < and jmr < N ǫ (for the values of M, j and Q chosen), to get, Therefore, f S+jM ( t ) = = x S+jM x S+jM e(x t ) e( a(t)x + rx) = a(t)(y + jm) e( y S = y S + r(y + jm)) e( a(t)y + ry)e( M ja(t))e(jmr) = e(jmr) e(y t y S ) = e(jmr)f S ( t ) N ǫ f S ( t ) f S+jM( t ) + N ǫ f S ( t ).

0 Again, using arseval identity we get j = < < + N ǫ t + N 3 2 ǫ )f S( t )f S+jM( t ) )f S( t )f S( t ) + t ) f S( t ) 2 ) f S( t ) f S( t ) N ǫ N... using equation (5) (8) Using equation (8) and equation (7), we get j + 2 j j + 2 j 6 roof of Main Theorem Lemmas 2 and 3 imply that ( 32 + δ)2 α α 2... α l 32 + δ. It cannot be the case that i, α i+ α i σ. Therefore, let s say that α i+ α i < σ for some i. By lemma 6, we have a pair of good residue classes mod, denote them by a and b. Let G a = {w a ǫ i+,a+wa δ 4 } and G b = {w b ǫ i+,b+wb δ 4 } Consider residue class c a+b modulo. It is QR modulo. When class c is subdivided into smaller residue classes modulo +, some of them will be QRs modulo +. Let s denote their index set as G c = {w c 0 w c < + and c + w c is a QR modulo + }

c = a + b G_c a G_a b G_b Modulo q_i Modulo q_{i+} Let s denote G c = L. Note that when we shift through an interval of length h+ in residue class c modulo, we are shifting through an interval of length h through each of the refined residue classes modulo +. Using lemma 7, the number of perfect squares in the interval of length H = h+ is at least h+ 4 N. Then it is easy to see that the number of perfect squares in the corresponding interval of length h in any of the residue classes in G c hq is at least i+ 8 L. N Let W wc = {(w a, w b ) w a G a and w b = w c w a + c a b modulo + } Note that when we add any residue pair w a and w b in W wc, we get QR w c modulo +. From lemma 6 and definition 3, G a = W wc = G b 3+ 4. So, at least 50% elements (w a, w b ) in W wc are such that w a G a and w b G b. That is, if we define G wc = W wc (G a G b ) then G wc + 2. Choose any pair (w a, w b ) G wc. The number of solutions we get for equation x + y = z 2, given that we choose x from residue class a + w a and y from residue class b + w b modulo + and do shifting in the interval of length h = N 2ǫ, ǫ > 0, is at least δn 4+ δn 4+ h+ 8 L N.

2 If we do this for all pairs in G wc and then repeat it for all different possible w c G c, we find that the number of solutions for x + y = z 2 is at least + δn δn h+ w c G c 2 4+ 4+ 8 L N = δ2 N Nh w c G c 2562 L = δ2 N Nh. 256q 2 i This contradicts the upper bound that we get when we use Shifting Lemma with = p i+ and C = (that is M = + ) since h pi+ δ2 N Nh 256q 2 i (owing to lemma ). References [EG-80] [M] [LOS-82] [LOS-83] [S-78] [V-97]. Erdős, R. L. Graham, Old and New roblems and Results in Combinatorial Number Theory, Monogr. Enseignment Mathe., 28, 980. J.. Massias, Sur les suites dont les sommes des terms 2 á 2 ne sont par des carrés. J.. Lagarias, A. M. Odylzko, J. B. Shearer, On the Density of Sequences of Integers the Sum of No Two of which Is a Square. I. Arithmetic rogressions, Journal of Combinatorial Theory. Series A, 33, pp. 67 85, 982. J.. Lagarias, A. M. Odylzko, J. B. Shearer, On the Density of Sequences of Integers the Sum of No Two of which Is a Square. II. General Sequences, Journal of Combinatorial Theory. Series A, 34, pp. 23 39, 983. A. Sárkőzy, On Difference Sets of Sequences of Integers, I, Acta Math. Acad. Sci. Hungar., 3, pp. 25 49, 978. R. C. Vaughan, The Hardy-Littlewood Method, 2nd edition, Cambridge University ress, 997.