Perspectives of Mathematics - PDF Free Download

Perspectives of Mathematics Module on Diophantine Equations and Geometry Fall 2004 Michael Stoll What is a diophantine equation? 1. Introduction and Examples Being diophantine is not so much a property of the equation (or, more generally, system of equations), but rather related to the kind of solutions we want to know about. Definition. A diophantine equation (or system of equations) is an algebraic equation (or system of equations) that we want to solve in integers or rational numbers. An algebraic equation is one that only uses addition, multiplication and constants (which will be integers for us); in other words, the equations take the form F (X 1,..., X n ) = 0 where F Z[X 1,..., X n ] is a polynomial with integral coefficients in a certain number of variables. The reason why these equations are named after Diophantus of Alexandria (of whom only very little is known; even his lifespan has only vaguely be determined to likely have been in the third century AD) is that he has written a series of books called Arithmetica, where he deals with certain types of such equations. His books constitute the earliest known fairly systematic treatment of problems of this kind. Here are some examples. (1) A linear equation. ax + by = c (where a, b, c are given integers). (2) Pythagorean triples. X 2 + Y 2 = Z 2. (3) Fermat s equation. X n + Y n = Z n (where n 3 is a specified integer). (4) Sums of four squares. X 2 1 + X 2 2 + X 2 3 + X 2 4 = n (where n > 0 is a specified integer). (5) An instance of Pell s equation. X 2 409 Y 2 = 1. (6) The rational box. X 2 + Y 2 = U 2, X 2 + Z 2 = V 2, Y 2 + Z 2 = W 2, X 2 + Y 2 + Z 2 = T 2. (7) An elliptic curve. Y 2 = X 3 + 7823. (8) A generalized Fermat equation. X 2 + Y 3 = Z 7.

2 In example (7), we are interested in rational solutions, whereas in examples (1), (4) and (5), we are interested in integral solutions. For examples (2), (3) and (6), it doesn t really matter, since we can scale solutions by a common factor. In these cases, we will be interested in primitive integral solutions, i.e., solutions in integers without common factor. For example, (X, Y, Z) = (3, 4, 5) is a primitive integral solution of X 2 + Y 2 = Z 2, while (15, 36, 39) is not. In these homogeneous cases (where in every equation, all terms have the same total degree), we only consider non-trivial solutions the trivial solution is when all variables take the value zero (the trivial solution is obviously not primitive, so it is ruled out automatically when we restrict to primitive integral solutions). In example (8), we are also interested in primitive integral solutions (for less obvious, but very good reasons). Now, associated to such an equation or system of equations, there are two basic problems. (1) Are there (rational/integral/primitive integral) solutions at all? (2) If so, can we describe them all? Is there anything we can say in general about these problems? Yes, there is. The question whether there exists an algorithm to solve the first problem in the case of integral solutions is known as Hilbert s tenth problem. David Hilbert, who was a famous German mathematician and one of the last people who had an overview over mathematics as a whole, gave a famous talk at the International Congress of Mathematicians in 1900, where he listed 23 problems, whose solution he thought would move mathematics ahead. His tenth problem was eventually solved in the 1970s, but the solution was negative: there is no such procedure. Interestingly, Matijasevich, who finished the proof, used the theory of the Pell Equation in his work. (NB: This might be a term paper project!) In the case of rational solutions, the problem is open; it is the fundamental open problem in the area. It is likely that at least for certain classes of equations (for example, equations in two variables), there is an algorithm, but for the general problem, it is hard to judge. The second problem is, of course, even harder. We may easily spot one (or several) solutions, but to find them all (or to prove that there are no others besides the ones we found) can be really difficult. Of course, if the answer to the first problem is no, then the second is solved at the same time! To finish this introduction, let me give you some indications about the answers to the two problems for the examples I gave earlier. (1) ax + by = c (where a, b, c are given integers). This has solutions if and only if c is a multiple of the gcd of a and b. In this case, we can find one solution (x 0, y 0 ) by the Euclidean algorithm, and all solutions are then given by (x, y) = (x 0 bt, y 0 + at) for t Z. (2) X 2 + Y 2 = Z 2. There is a well-known parametrization of the primitive integral solutions, which we will derive soon in various ways. (3) X n + Y n = Z n (where n 3 is a specified integer). The only solutions are those where one of the variables vanishes. This is the statement of Fermat s Last Theorem, which only became a theorem about ten years ago.

(4) X1 2 + X2 2 + X3 2 + X4 2 = n (where n > 0 is a specified integer). Lagrange proved that this always has solutions. There is also a formula for the number of solutions for any given n; it is { 8 if n is odd 24 if n is even } d n,d odd (5) X 2 409 Y 2 = 1. Besides the obvious solutions with Y = 0, this equation has in fact infinitely many integral solutions. They are all generated (by a recurrence formula) by the rather large basic solution X = 1238789998647218582160, Y = 25052977273092427986049. (6) X 2 + Y 2 = U 2, X 2 + Z 2 = V 2, Y 2 + Z 2 = W 2, X 2 + Y 2 + Z 2 = T 2. Whether this has any nontrivial solution is an open problem. (There are solutions known when one of the variables is allowed to take an irrational value.) The main reason why this problem is hard is that the associated geometric object is a surface, whereas in most of the cases that can be solved, it is a curve. (7) Y 2 = X 3 + 7823. This has infinitely many rational solutions, again generated by a basic solution, which is X = 2263582143321421502100209233517777 11981673410095561 2 Y = 186398152584623305624837551485596770028144776655756 11981673410095561 3 (Found be me a few years ago.) (8) X 2 + Y 3 = Z 7. This has exactly the following primitive integral solutions. (±1, 1, 0), (±1, 0, 1), ±(0, 1, 1), (±3, 2, 1), (±71, 17, 2), (±2213459, 1414, 65), (±15312283, 9262, 113), (±21063928, 76271, 17). This is a recent result of myself together with Bjorn Poonen and Ed Schaefer. 2. Pythagorean Triples Let us start with the rather well-known case of Pythagorean Triples to show various methods to solve diophantine equations. First Approach: Algebraic/Arithmetic. We first remind ourselves that we are looking for primitive integral solutions of our equation X 2 + Y 2 = Z 2. This implies that X, Y, Z cannot be all even. Can they all be odd? No, since then the left hand side is even, but the right hand side is odd. Arguing like this, we see that two of the variables have to be odd, the last one even. Now I claim that Z must be odd. Otherwise X and Y are both odd, and then X 2 and Y 2 are both of the form 4k + 1, so their sum is of the form 4k + 2; however, Z 2 is divisible by 4, a contradiction. d 3

4 So we can assume that X is even, Y and Z are odd (but we have to bear in mind that there is a symmetric case with Y even). Now we write X 2 = Z 2 Y 2 = (Z Y )(Z + Y ) and observe that each of the two factors on the right is even: We can also set X = 2W. This leads to Z Y = 2U, Z + Y = 2V. W 2 = UV, and I claim that U and V have no common divisors. For any common divisor d would also divide Y = V U and Z = V + U and thence X, contradicting the assumption that the solution is primitive. Now we need the following key fact. Lemma. If a and b are coprime integers such that ab = c 2 is a square, then either a = u 2, b = v 2, or a = u 2, b = v 2 with suitable integers u and v. Proof. We consider the prime factorizations of a, b and c 2. In the latter, all primes occur with an even exponent. Now, since a and b do not have a common prime divisor, any prime p occurring in c can only occur either in a or in b (but not in both). So the power of p dividing whichever of a or b must be the same as the one dividing c 2, hence the exponent is even. So all primes dividing a or b occur in the factorization with an even exponent; this implies a = ±u 2 and b = ±v 2 with integers u and v. Finally, c 2 is positive, so a and b must have the same sign. Note that this proof is based on two facts. (1) The ring Z of integers has unique factorization: every integer can be written as a unit times a product of prime numbers, which is uniquely defined up to the ordering of the factors. (This is the so-called fundamental theorem of arithmetic.) (2) Z has just the two units 1 and 1. (A unit is an element that has a multiplicative inverse in the ring.) Let me note here that this fairly innocent-looking lemma is in fact very strong and (together with its generalizations) can be used to deal with many diophantine problems. For example, this is true for Example (7) above. Let us now use this lemma for the solution of our equation. We had rewritten it in the form W 2 = UV, where U and V are coprime. Applying the lemma, we find integers S and T such that U = S 2, V = T 2 or U = S 2, V = T 2. Going back to the original variables, we find or Y = T 2 S 2, Z = T 2 + S 2, X = 2T S Y = S 2 T 2, Z = (S 2 + T 2 ), X = 2ST. In each case, S and T must be of different parity in order to get a primitive solution. For the complete solution, we have to add another pair of such formulas, where the roles of X and Y are switched.

Second Approach: Geometric. Now let us prove the same result in a rather different way. We first observe that Z 0 in any primitive solution, which allows us to divide the equation by Z 2. Setting x = X/Z and y = Y/Z, we obtain x 2 + y 2 = 1, and we are now looking for rational solutions. You certainly know from school that this equation has a geometric meaning: it characterizes those points (x, y) in the cartesian plane whose distance from the origin is 1, i.e., the points on the unit circle. In this view, we are asking about the rational points on the unit circle, i.e., the points with rational coordinates. Now there are four easy points to spot: (1, 0), (0, 1), ( 1, 0), (0, 1). Can we use this to find more? The idea is to use lines. Fix one of the points above, say P = ( 1, 0). If Q = (x, y) is any other rational point on the circle, then the line joining P and Q will have rational slope y/(x + 1) (and intercept, which happens to be the same number). Conversely, if we consider the line through P with rational slope t, then it will intersect the circle again at another rational point. Let s do the math: The equation of the line is y = t(x + 1). Plugging this into the equation of the circle, we get x 2 + t 2 (x + 1) 2 = 1 or (1 + t 2 )x 2 + 2t 2 x + t 2 1 = 0. Now we know one solution of this, namely x = 1, corresponding to P. Dividing by x + 1, we get (1 + t 2 )x + t 2 1 = 0, and so x = 1 t2 1 + t 2. In this way, we get a bijection between the rational points on the circle except P and the rational numbers: Q {(x, y) Q 2 : x 2 + y 2 = 1, x 1} t ( ) 1 t 2 2t, 1+t 2 1+t 2 y (x, y) x+1 The missing point P on the right can be considered as corresponding to the limiting case t. To translate this back into primitive integral solutions to our original equation in X, Y and Z, we have to look at numerator and denominator of x and y. We write t = V/U as a fraction in lowest terms; then we have X Z = U 2 V 2 U 2 + V 2, Y Z = 2UV U 2 + V 2. Now if U and V are not both odd (so they are of different parity, since they cannot be both even either), then U 2 V 2 and U 2 + V 2 are coprime, and we get, taking Z to be positive, X = U 2 V 2, Y = 2UV, Z = U 2 + V 2 as before. If U and V are both odd, then we can write U + V = 2S, U V = 2T ; 5

6 this gives U 2 V 2 = 4ST, U 2 + V 2 = (S + T ) 2 + (S T ) 2 = 2(S 2 + T 2 ) and 2UV = 2(S 2 T 2 ), hence X Z = 2ST S 2 + T, Y 2 Z = S2 T 2 S 2 + T. 2 Since U = S + T is odd, S and T are of different parity, and the fractions on the right are in lowest terms. This gives the other parametrization, taking care of the case that X is even. Note that we get the point P by taking U = 0, V = 1, which corresponds to t = 1/0 =. 3. Ternary Quadratic Forms and Lattice Points Ternary Quadratic Forms. We now want to generalize the Pythagorean Triples example to more general equations of the shape Q(X, Y, Z) = a X 2 + b Y 2 + c Z 2 = 0. We again observe that if (X, Y, Z) is a nontrivial solution, then also (λx, λy, λz) is a nontrivial solution, for any λ 0. So we can either restrict to primitive integral solutions as before, or we can consider all these scaled solutions as being essentially the same solution, and therefore ask for solutions up to scaling. If we do this, it does not make a difference whether we ask for integral or for rational solutions. Given an equation like the one above, there are three problems one might want to solve. (1) Does there exist a (nontrivial integral or rational) solution? (2) If one exists, find one! (3) Describe all solutions, given one solution! We have seen how the last problem can be solved using geometry in the special case X 2 + Y 2 Z 2 = 0. The only property of the equation we need for this is that it is of degree 2. So this procedure readily generalizes. As an example, consider X 2 + 2 Y 2 = 17 Z 2. After a few tries, we find a solution (X, Y, Z) = (3, 2, 1). We divide by Z 2 and obtain (again writing x = X/Z and y = Y/Z) the equation of an ellipse, x 2 + 2 y 2 = 17, which has a rational point P = (3, 2). Taking the line through P of slope t, with equation y = t(x 3) + 2, and computing the other point of intersection with the ellipse, we get: and so x 2 + 2(t(x 3) + 2) 2 17 = 0 (1 + 2t 2 )x 2 + 4t(2 3t)x + (18t 2 24t 9) = 0 (x 3)((1 + 2t 2 )x + ( 6t 2 + 8t + 3)) = 0 x = 6t2 8t 3 y = t(x 3) + 2 = 4t2 6t + 2. 2t 2 + 1 2t 2 + 1 In this way, we get again a bijection between the rational points on the ellipse (which are in bijection with integral/rational solutions to the original equation,

up to scaling) and the set of rational numbers, plus infinity (which corresponds to the point (x, y) = (3, 2)). (Question: which value of t corresponds to the point P?) Before we look at the first problem, let us make some reductions. We assume that a, b, c are nonzero integers. Now if a, b, c have a common divisor d, we can simply divide the equation by d and thus obtain a smaller equation. But we can also do something if just two coefficients, a and b, say, have a common divisor d: write a = a d, b = b d and multiply the equation by d; then you get a d 2 X 2 + b d 2 Y 2 + cd Z 2 = a (dx) 2 + b (dy ) 2 + cd Z 2 = 0. So, scaling the variables appropriately, we obtain the smaller equation a X 2 + b Y 2 + cd Z 2 (if we measure size by the absolute value of the product of the coefficients). Continuing in this way, we arrive at an equation where the coefficients a, b, c are coprime in pairs. Finally, again by scaling the variables, we can get rid of any square factors in the coefficients; therefore we can also assume that a, b, c are squarefree (i.e., not divisible by a square > 1). Now, coming back to the first problem, we can easily prove that solutions exist by exhibiting one. On the other hand, how can we prove that there are no solutions? Consider for example the case that all of a, b, c are positive. Then there is obviously no nontrivial solution, since the expression on the left hand side will always be positive: there are not even nontrivial solutions in real numbers. On the other hand, we can also use divisibility arguments. Take for example X 2 + Y 2 3 Z 2 = 0. There are certainly solutions in real numbers here, but still there are no integral solutions. For assume (X, Y, Z) is an integral solution; then by scaling, we can also assume that not all of X, Y, Z are divisible by 3. From the equation, we get that 3 divides X 2 +Y 2, and since X 2 is either of the form 3k or of the form 3k +1, this is only possible when 3 divides both X and Y. But then 9 divides 3 Z 2, and so 3 also divides Z, a contradiction. So no solution can exist. (Exercise: prove this by an argument using divisibility by 4!) What necessary conditions do we get in this way? Proposition. If there is a nontrivial integral or rational solution of a X 2 + b Y 2 + c Z 2 = 0, where a, b, c are nonzero integers, coprime in pairs and squarefree, then a, b, c cannot all have the same sign. Furthermore, there exist integers u, v, w such that a u 2 + bc, b v 2 + ac, c w 2 + ab. Proof. The first assertion is clear. For the other statement, assume (without loss of generality) that (X, Y, Z) is a primitive integral solution. Then a divides (by ) 2 + bcz 2. If Z and a had a common prime divisor p, then p would have to divide by and hence Y (since gcd(a, b) = 1). But then p 2 will divide ax 2, hence (since a is squarefree, so p 2 a) p also divides X, contradicting the assumption that (X, Y, Z) is a primitive solution. So we have gcd(a, Z) = 1; this implies that there is an integer r such that a rz 1. Taking u = rby, we see that a u 2 + bc. The existence of v and w is proved in the same way. 7

8 Note that we can efficiently check these necessary conditions if we know the prime factorizations of a, b, and c. (For all prime divisors p of a, we need to check that bc is a square mod p, etc. If we want to find u, v, w, we have to extract square roots modulo primes and then combine the results using the Chinese Remainder Theorem.) The surprising fact now is that these conditions are already sufficient! Theorem (Legendre). If a, b, c are nonzero integers, squarefree and coprime in pairs, not all of the same sign, and if there exist integers u, v, w such that a u 2 + bc, b v 2 + ac, c w 2 + ab, then a X 2 + b Y 2 + c Z 2 = 0 has a nontrivial integral solution. The basic idea of the proof is the following. We want to show that (for a suitable choice of (X, Y, Z) (0, 0, 0)), a certain integer z = a X 2 + b Y 2 + c Z 2 is zero. This can be done by showing two properties of z: (1) z is a multiple of d for some non-zero integer d; (2) z < d for the same d. In the proof, we will first construct a suitable subset of Z 3 such that its elements (x, y, z) satisfy 2abc a x 2 + b y 2 + c z 2. Then we will use a nice theorem to conclude that for at least one non-zero element, we also have a x 2 + b y 2 + c z 2 < 2 abc. Lattices and Minkowski s Theorem. What is a lattice? For our purposes, we will use the following definition (the usual definition is a bit more general). A lattice in R n is a subgroup Λ of finite index in Z n. (The finite index condition is equivalent to saying that there is some integer N 1 such that (NZ) n Λ.) The index (Λ) = (Z n : Λ) = #(Z n /Λ) is called the covolume of Λ. Examples of lattices are Z n itself or the even lattice Λ = {(x, y, z) Z 3 : x + y + z is even} with (Λ) = 2. More generally, we have the following. Lemma. (1) If gcd(a, r, s, t) = 1 and a 0, then Λ = {(x, y, z) Z 3 : a rx + sy + tz} is a lattice of covolume a. (2) If Λ 1, Λ 2 Z n are lattices in R n, then Λ = Λ 1 Λ 2 is again a lattice. If we also have that gcd( (Λ 1 ), (Λ 2 )) = 1, then (Λ) = (Λ 1 ) (Λ 2 ). Let us use these facts in order to come up with a lattice that will help prove the theorem. Lemma. Under the hypotheses of the theorem, there exists a lattice Λ such that (Λ) = 2 abc and such that 2abc a x 2 + b y 2 + c z 2

9 for all (x, y, z) Λ. Proof. For simplicity, we will assume here that abc is odd. The even case is very similar, but a little bit more involved. Let us define four lattices: Λ 1 = {(x, y, z) Z 3 : a by uz} Λ 2 = {(x, y, z) Z 3 : b cz vx} Λ 3 = {(x, y, z) Z 3 : c ax wy} Λ 4 = {(x, y, z) Z 3 : 2 x + y + z} By the first part of the lemma, we have (Λ 1 ) = a, (Λ 2 ) = b, (Λ 3 ) = c, and (Λ 4 ) = 2. So if we set Λ = Λ 1 Λ 2 Λ 3 Λ 4, we get by repeated application of the second part of the lemma that (Λ) = 2 abc. It remains to show that the divisibility statement holds. Let (x, y, z) Λ 1. Then we have the following chain of implications. a u 2 + bc = a (u 2 + bc)z 2 = a (uz) 2 + bc z 2 = a (by) 2 + bc z 2 = a b(b y 2 + c z 2 ) = a b y 2 + c z 2 = a a x 2 + b y 2 + c z 2 Similarly, we see that for (x, y, z) Λ 2 (resp. Λ 3 ), we have that b (resp. c) divides ax 2 + by 2 + cz 2. Since a, b and c are odd, we also see that ax 2 + by 2 + cz 2 is always even for (x, y, z) Λ 4. Taking all of this together, for (x, y, z) Λ, the value ax 2 + by 2 + cz 2 is divisible by a, b, c and 2. Since these four numbers are relatively prime in pairs, it follows that 2abc divides ax 2 + by 2 + cz 2. Now we want to prove that we can find (x, y, z) such that ax 2 +by 2 +cz 2 is actually zero. We know that we can at least arrange to have the value divisible by 2abc. So in order to ensure to get a zero value, it suffices to show that there is some (0, 0, 0) (x, y, z) Λ such that ax 2 + by 2 + cz 2 < 2abc. The following theorem will allow us to produce that such an element. Theorem (Minkowski). Let Λ R n be a lattice of covolume (Λ), and let S R n be convex and symmetric (meaning that if s S, we also have s S), such that vol S > 2 n (Λ). Then S contains a non-zero lattice point from Λ. In order to use this result for the proof of Legendre s Theorem, we have to come up with a sufficiently large convex symmetric set that produces the bound on ax 2 + by 2 + cz 2. At this point, we have to use the not all signs equal condition. Let us assume (without loss of generality) that a, b > 0 and c < 0. Then we can take S to be the elliptic cylinder S = {(x, y, z) R 3 : ax 2 + by 2 < 2 abc, z 2 < 2ab}. Let us check the volume condition. We have vol S = 2π abc ab 2 2ab = 4 2π abc > 16 abc = 2 3 (Λ), so this is OK, and Minkowski s Theorem tells us that there is some (0, 0, 0) (x, y, z) S Λ. For this (x, y, z), we have on the one hand that 2abc ax 2 + by 2 + cz 2.

10 On the other hand, by the definition of S, so 0 ax 2 + by 2 < 2 abc and 0 cz 2 > 2 abc, ax 2 + by 2 + cz 2 < 2 abc. Together, these conditions imply that ax 2 + by 2 + cz 2 = 0, finishing the proof. Here is a sketch of how you prove Minkowski s Theorem. The first step is to consider X = 1 S (i.e., the set S shrunk by a factor of 1/2) and to show that X 2 meets one of its translates X + λ for some λ Λ \ {0}. The idea for this is to note that vol X = 2 n vol S > (Λ), and so if all translates X +λ (λ Λ) were pairwise disjoint, then the union would be too large. More precisely, a big hypercube of sidelength N contains about N n / (Λ) lattice points, and so it contains about this number of translated copies of X. If they are disjoint, the volume of their union would be approximately vol X N n / (Λ) > CN n with some C > 1. On the other hand, the volume of the hypercube is only N n, leading to a contradiction. (To make this rigorous, one has to estimate the errors due to boundary effects, but this can easily be done.) This shows that there are distinct λ 1, λ 2 Λ such that (X + λ 1 ) (X + λ 2 ). But then (translating by λ 1 ), we also have that X (X + λ) with λ = λ 2 λ 1. Concretely, this means that there are x, y X such that x = y+λ, or x y = λ 0. Now S = 2X, hence 2x S, 2y S, then (since S is symmetric) 2y S, and finally λ = 1 (2x + ( 2y)) S (since S is convex: with two points, it contains the 2 midpoint of the line segment joining them). 4. Sums of Four Squares In this section, we want to state and prove Lagrange s Theorem on sums of four squares. Theorem (Lagrange). integer squares. For example, Every positive integer n is the sum of (at most) four 1 = 1 2, 2 = 1 2 + 1 2, 3 = 1 2 + 1 2 + 1 2, 4 = 1 2 + 1 2 + 1 2 + 1 2 = 2 2, 5 = 1 2 + 2 2, 6 = 1 2 + 1 2 + 2 2, 7 = 1 2 + 1 2 + 1 2 + 2 2, 8 = 2 2 + 2 2, 9 = 1 2 + 2 2 + 2 2 = 3 2, 10 = 1 2 + 1 2 + 2 2 + 2 2 = 1 2 + 3 2,... Note that 7 cannot be written as a sum of less than four squares, so four squares will be necessary in general. How can we prove a statement like this? The proof we will see has two main parts. The first part uses a multiplicative property of sums of four squares to reduce to the case that n is a prime number p. The second part is somewhat in the spirit of the proof of Legendre s Theorem: we set up a suitable lattice that gives us divisibility by p and then use Minkowski s Theorem to deduce that some lattice point does what we want. Let us look at the first part. The result we need is as follows. Lemma. If m and n are both sums of four squares, then so is mn.

11 Proof. Suppose that m = x 2 1 + x 2 2 + x 2 3 + x 2 4 and n = y1 2 + y2 2 + y3 2 + y4 2. Then one can easily check that mn = (x 1 y 1 x 2 y 2 x 3 y 3 x 4 y 4 ) 2 + (x 1 y 2 + x 2 y 1 + x 3 y 4 x 4 y 3 ) 2 + (x 1 y 3 x 2 y 4 + x 3 y 1 + x 4 y 2 ) 2 + (x 1 y 4 + x 2 y 3 x 3 y 2 + x 4 y 1 ) 2. Remark. What lies behind this is the existence of the skew field of quaternions, H = R + Ri + Rj + Rk where i 2 = j 2 = k 2 = ijk = 1; in particular, (a + bi + cj + dk)(a bi cj dk) = a 2 + b 2 + c 2 + d 2. By the lemma, using induction and the prime factorization of positive integers, we see that it suffices to show that every prime number p is a sum of four squares. The first step in approaching this is to show that at least a number divisible by p can be written in this way; more precisely, we show the following. Lemma. Let p be a prime number. Then there are integers u and v such that p divides 1 + u 2 + v 2. Proof. This is clear when p = 2 (take u = 1, v = 0), so we can assume that p is odd. The best way of looking at this is to compute with residue classes modulo p: we identify integers whose difference is a multiple of p. In this way, we get p different classes of integers (each comprising integers which leave the same remainder when divided by p), and we can add, subtract and multiply them by working with representative elements. Now the fact is: There are exactly (p + 1)/2 residue classes that are squares (of residue classes). To see this, we note that the classes of k and k have the same square, and the two classes are distinct unless both are the zero class of numbers divisible by p. On the other hand, two classes having the same square have to be identical or negatives of each other. (If p divides x 2 y 2, then p divides x y or x + y.) Since there are (p + 1)/2 (unordered) pairs consisting of a class and its negative, the claim follows. Using this, we see that there are (p + 1)/2 classes of the form 1 + u 2 and also (p + 1)/2 classes of the form v 2. Since there are only p classes altogether, there must be one class that is common to both sets of classes. But this exactly means that there are integers u and v such that p divides (1+u 2 ) ( v 2 ) = 1+u 2 +v 2. We now use these numbers u and v to define a lattice: Λ = {(x 1, x 2, x 3, x 4 ) Z 4 : p x 2 ux 1 vx 4, p x 3 vx 1 + ux 4 } Since there are two independent mod-p conditions, we get that (Λ) = p 2. Also, I claim that for every lattice element (x 1, x 2, x 3, x 4 ) Λ, we have that p divides x 2 1 + x 2 2 + x 2 3 + x 2 4. To see this, note that and so p x 2 ux 1 vx 4 = p x 2 2 (ux 1 + vx 4 ) 2 p x 3 vx 1 + ux 4 = p x 2 3 (vx 1 ux 4 ) 2 p x 2 1 + x 2 2 + x 2 3 x 2 4 p x 2 1 + (ux 1 + vx 4 ) 2 + (vx 1 ux 4 ) 2 + x 2 4 p (x 2 1 + x 2 4)(1 + u 2 + v 2 ),

12 which is true. Now we know that the sum of squares is divisible by p on Λ, we only need to know that Λ contains a sufficiently small element. We take S = {(x 1, x 2, x 3, x 4 ) R 4 : x 2 1 + x 2 2 + x 2 3 + x 2 4 < 2p} ; this is the open four-dimensional ball of radius 2p. What is its volume? Well, vol S = dx 1 dx 2 dx 3 dx 4 x 2 1 +x2 2 +x2 3 +x2 4 ( <2p ) = dx 3 dx 4 dx 1 dx 2 x 2 1 +x2 2 <2p x 2 3 +x2 4 <2p x2 1 x2 2 = π(2p x 2 1 x 2 2) dx 1 dx 2 = 2π 2 (2p r 2 ) r dr x 2 1 +x2 2 <2p 0 ( ) = 2π 2 pr 2 r4 2p = 2π 2 (2p 2 p 2 ) = 2π 2 p 2. 4 0 Since π 2 > 8, this is larger than 2 4 (Λ) = 16p 2, so by Minkowski s Theorem, there is some 0 (x 1, x 2, x 3, x 4 ) S Λ. Then we have that 2p p x 2 1 + x 2 2 + x 2 3 + x 2 4 and 0 < x 2 1 + x 2 2 + x 2 3 + x 2 4 < 2p, hence x 2 1 + x 2 2 + x 2 3 + x 2 4 = p, as was to be shown. 5. Pell s Equation In this section, we will discuss the equation X 2 d Y 2 = 1, to be solved in integers. Here, d > 1 is a squarefree integer. Let us look at some examples. For d = 2, we have the following solutions with x, y 0. x 1 3 17 99 577 3363 19601 y 0 2 12 70 408 2378 13860 For d = 3, we have And for d = 5, we get x 1 2 7 26 97 362 1351 y 0 1 4 15 56 209 780 x 1 9 161 2889 51841 930249 y 0 4 72 1292 23184 416020 We observe that there are nontrivial solutions, and also that they grow quite fast (the number of digits seems to grow linearly, so the numbers grow exponentially). The sequence of solutions seems to go on, so it appears that there are infinitely many solutions. Let us first prove an important fact about the structure of the solution set. More generally, we have the following.

13 Lemma. Let u, v, x, y be numbers such that u 2 dv 2 = a, x 2 dy 2 = b. Then (ux + d vy) 2 d (uy + vx) 2 = ab. Proof. Simple calculation. What is behind this is the following. We consider numbers of the form x + y d with x, y Z. The map N : x + y d (x + y d)(x y d) = x 2 dy 2 is called the norm (on the ring Z[ d] whose elements are these numbers). The lemma then says that the norm is multiplicative, i.e. N(αβ) = N(α)N(β) note that (u + v d)(x + y d) = (ux + dvy) + (uy + vx) d. Now let S d = {(x, y) Z 2 : x 2 d y 2 = 1} be the solution set we are interested in. Then the lemma implies the following. The set S d is an abelian group with zero element (1, 0) and ad- Proposition. dition (x, y) + (x, y ) = (xx + d yy, xy + x y). The subset S + d of solutions with x > 0 forms a subgroup of index 2. When S+ d contains more than one element, then there is a fundamental solution with x, y > 0 and x as small as possible, and all other solutions in S + d are integral multiples of the fundamental solution (and all these integral multiples are distinct). Proof. By the lemma, the addition as defined, applied to elements of S d, yields solutions again. The group axioms are easily verified (note that the negative is (x, y) = (x, y)). In the context of the map φ : S d (x, y) x + y d R, we get a group homomorphism. The map is injective note that 1/φ(x, y) = x y d, and so x = (φ(x, y) + 1/φ(x, y))/2 and y = (φ(x, y) 1/φ(x, y))/(2 d). Under φ, S + d is the preimage of the positive reals we have dy 2 = x 2 + 1, so y d < x, hence the sign of φ(x, y) is that of x. Since the positive reals form a subgroup of the multiplicative group of nonzero reals, S + d is a subgroup of S d. Its index is 2, since the index of the positive reals is 2, and for example φ( 1, 0) = 1 < 0. Now assume that S + d {(1, 0)}. Then among all solutions with x, y > 0, there will be one with smallest x. Call it (x 1, y 1 ). This is the fundamental solution (and it is uniquely determined, since for any x, there are at most two values of y satisfying the equation, and only one can be positive). I claim that φ(x 1, y 1 ) is the smallest element in φ(s d ) that is greater than 1. This follows from x + y d > 1 x, y > 0 for all (x, y) S d (by the formulas for x and y above) and from for all (x, y), (x, y ) S d. φ(x, y ) > φ(x, y) if x > x > 0 and y, y > 0 Since α := φ(x 1, y 1 ) > 1, all its powers are distinct, hence so are the integral multiples of (x 1, y 1 ) in the group structure of S d.

14 Now assume that there is some (x, y) S + d that is not an integral multiple of (x 1, y 1 ). Then there is a unique n Z such that α n < β := φ(x, y) < α n+1. But then (x, y) n(x 1, y 1 ) S + d maps to βα n under φ, which is strictly between 1 and α, contradicting the choice of α = φ(x 1, y 1 ). In more concrete terms, this means that if there are nontrivial solutions, then there is a smallest positive fundamental solution (x 1, y 1 ), and all nontrivial solutions are then of the form (±x n, ±y n ), where for n 1, Equivalently, x n+1 = x 1 x n + d y 1 y n, y n+1 = y 1 x n + x 1 y n. x n + y n d = (x1 + y 1 d) n. Note (exercise!) that x n is a polynomial of degree n in x 1 alone, independent of d, e.g., x 2 = 2x 2 1 1, x 3 = 4x 3 1 3x 1. The next question, of course, is whether there are indeed nontrivial solutions. The answer is: Theorem. For every squarefree integer d > 1, the equation X 2 d Y 2 = 1 has infinitely many nontrivial solutions. They are obtained from the smallest positive solution in the way described above. In order to prove that theorem, we have to look at how well one can approximate irrational numbers by rational ones. This is quite natural in the context of Pell s Equation, since for any (positive) solution (x, y), the fraction x/y must be quite close to d: ( x ) 2 1 d = y y, 2 so x y 1 d = y 2 (x/y + d) < 1 2 d y. 2 Can we get such good approximations? Lemma. Let α R \ Q be an irrational real number. Then there are infinitely many fractions p/q Q such that p q α 1 < q. 2 Proof. The proof is combinatorial. We will in fact show that for every n 1, there is a fraction p/q with q n such that p/q α < 1/qn. This implies the statement of the lemma. Define the fractional part of a real number x to be x = x x (where x is the largest integer x); we always have 0 x < 1 and x x Z. Consider the n+1 numbers 0, α, 2α,... nα in the interval [0, 1). We subdivide this interval into n subintervals [0, 1/n), [1/n, 2/n),..., [(n 1)/n, 1). Then there must be two of the n + 1 numbers falling into the same subinterval, therefore we have lα kα < 1/n for some 0 k < l n. Let m = lα kα, then we get (l k)α m < 1 m = n l k α 1 < (l k)n, and 0 < l k n.

15 Now from this we obtain at least the following consequence. Lemma. There are infinitely many pairs (x, y) of positive integers such that x 2 d y 2 < 2 d + 1. Proof. By the previous lemma, there are infinitely many pairs (x, y) of positive integers such that x/y d < 1/y 2. Take such a pair. Then x y d < 1/y and therefore x + y d < 2y d + 1/y. Multiplying, we get x 2 d y 2 < 2y d + 1/y y = 2 d + 1 y 2 2 d + 1. There are only finitely many integers between 2 d 1 and 2 d + 1, so for some of them, m say, there must be infinitely many pairs (x, y) such that x 2 dy 2 = m. Note that m 0, since d is not a square. Now there are only finitely many residue classes modulo m, and hence there must be two distinct pairs (x, y) and (u, v) such that x 2 d y 2 = u 2 d v 2 = m and m x u, m y v. Then, adding (x, y) and (u, v), we have (xu dyv) 2 d(xv yu) 2 = m 2, but also that m divides xu dyv and xv yu: xu dyv x 2 dy 2 = m 0 mod m, xv yu xy xy = 0 mod m. On the other hand, since (x, y) and (u, v) are distinct, xv yu 0. So, setting s = (xu dyv)/m and t = (xv yu)/m, we have found a nontrivial solution of our equation X 2 d Y 2 = 1. Now we know that there always is a fundamental solution. But how can we find it? The answer is provided by continued fractions. Continued fractions provide a method of constructing good rational approximations. The continued fraction expansion of an irrational real number α is obtained as follows. Let α 0 = α, and then for n 0, 1 a n = α n, α n+1 = = 1 α n a n α n. Note that for n 1, α n > 1, and so a n 1. In addition, define p 2 = 0, q 2 = 1, p 1 = 1, q 1 = 0 p n+1 = a n p n + p n 1, q n+1 = a n q n + q n 1. Then for n 0, p n 1 = a 0 + =: [a q n 1 0 ; a 1, a 2,..., a n ], a 1 + 1 a 2 + + 1 a n and these fractions are in a certain sense the best rational approximations to α. In particular, we have p n α < 1. q n qn 2

16 Fact. If α = d with d > 1 squarefree, then the sequence a 1, a 2,... is periodic, i.e., there is some (smallest) k 1 such that for all n 1, one has a n+k = a n. We can recognize k by the property that it is the first n 1 such that α n = m + d for some m Z. Fact. Let k be the period as above, then x = p k 1, y = q k 1 is the smallest positive solution to X 2 d Y 2 = ±1. If the sign is positive (this is the case when k is even), then it is the fundamental solution. If the sign is negative (when k is odd), then its double (2x 2 + 1, 2xy) = (p 2k 1, q 2k 1 ) is the fundamental solution of X 2 d Y 2 = 1. Example. Let us compute the fundamental solution of X 2 23 Y 2 = 1. We find n α n a n p n q n 0 23 4 4 1 1 2 3 4 1 23 4 = 7 23 3 = 2 23 3 = 23 + 4 7 23 + 3 2 23 + 3 7 7 23 4 = 23 + 4 1 5 1 3 19 4 1 24 5 So k = 4, and x = 24, y = 5 is the fundamental solution. Some Ideas for Term Paper Projects (1) Look at, understand and perhaps implement the algorithm described by Cremona and Rusin (paper obtainable from me) for solving diagonal ternary quadratic equations ax 2 + by 2 + cz 2 = 0. (2) Learn about quadratic reciprocity and present one proof of the Law of Quadratic Reciprocity. See for example the notes for my Introductory Number Theory class (you can find them under Teaching at IUB / IntroNumTh- 2004-Spring on my home page), or Ireland-Rosen, 5, or Nathanson, 3. (3) Learn about the proof by descent of Lagrange s Theorem (my notes [see above] or Ireland-Rosen, 17.7). Same for representing p 1 mod 4 as a sum of two squares, plus some other examples of descent (e.g., X 4 ± Y 4 = Z 2 ). (4) Learn about the formula for the number of ways to represent a positive integer as a sum of four squares. (Ireland-Rosen, 17.7) (Either use the formula for two squares, or understand its proof, too.) (5) Pell s Equation and Hilbert s Tenth Problem. (Chapter 2 of Matiyasevich s book. Somewhat involved but elementary.) (6) Continued fractions and Pell s Equation. (Chapter 7 of Bressoud-Wagon, for example.)