PMATH 340 Lecture Notes on Elementary Number Theory. Anton Mosunov Department of Pure Mathematics University of Waterloo

Size: px

Start display at page:

Download "PMATH 340 Lecture Notes on Elementary Number Theory. Anton Mosunov Department of Pure Mathematics University of Waterloo"

Sarah Terry
5 years ago
Views:

1 PMATH 340 Lecture Notes on Elementary Number Theory Anton Mosunov Department of Pure Mathematics University of Waterloo Winter, 2017

2 Contents 1 Introduction Divisibility. Factorization of Integers. The Fundamental Theorem of Arithmetic Greatest Common Divisor. Least Common Multiple. Bézout s Lemma Diophantine Equations. The Linear Diophantine Equation ax + by = c Euclidean Algorithm. Extended Euclidean Algorithm Congruences. The Double-and-Add Algorithm The Ring of Residue Classes Z n Linear Congruences The Group of Units Z n Euler s Theorem and Fermat s Little Theorem The Chinese Remainder Theorem Polynomial Congruences The Discrete Logarithm Problem. The Order of Elements in Z n The Primitive Root Theorem Big-O Notation Primality Testing Trial Division Fermat s Primality Test Miller-Rabin Primality Test Public Key Cryptosystems. The RSA Cryptosystem The Diffie-Hellman Key Exchange Protocol Integer Factorization

3 19.1 Fermat s Factorization Method Dixon s Factorization Method Quadratic Residues The Law of Quadratic Reciprocity Multiplicative Functions The Möbius Inversion The Prime Number Theorem The Density of Squarefree Numbers Perfect Numbers Pythagorean Triples Fermat s Infinite Descent. Fermat s Last Theorem Gaussian Integers Fermat s Theorem on Sums of Two Squares Continued Fractions The Pell s Equation Algebraic and Transcendental Numbers. Liouville s Approximation Theorem Elliptic Curves

4 1 Introduction This is a course on number theory, undoubtedly the oldest mathematical discipline known to the world. Number theory studies the properties of numbers. These may be integers, like 2,0 or 7, or rational numbers like 1/3 or 7/9, or algebraic numbers like 2 or i, or transcendental numbers like e or π. Though most of the course will be dedicated to Elementary Number Theory, which studies congruences and various divisibility properties of the integers, we will also dedicate several lectures to Analytic Number Theory, Algebraic Number Theory, and other subareas of number theory. There are many interesting questions that one might ask about numbers. In search for answers to these questions mathematicians unravel fascinating properties of numbers, some of which are quite profound. Here are several curious facts about prime numbers: 1. Every odd number exceeding 5 can be expressed as a sum of three primes (Helfgott s Theorem, 2013); 2. There are infinitely many prime numbers p and q such that p q 264 (Zhang s Theorem, Zhang proved the result for , and the constant was reduced to 264 by Maynard, Tao, Konyagin and Ford); 3. There always exists a prime between two consecutive perfect cubes (Ingham s Theorem, 1937); 4. There are infinitely many primes of the form x 2 + y 4 (Friedlander-Iwaniec Theorem, 1997); 5. Up to x > 1, there are approximately x/logx prime numbers (Prime Number Theorem, 1896); 6. Given a positive integer d, there exist distinct prime numbers p 1, p 2,..., p d which form an arithmetic progression (Green-Tao Theorem, 2004). Despite the simplicity of their formulations, all of these results are highly nontrivial and their proofs reside on some deep theories. For example, the Green-Tao Theorem resides on Szemerédi s Theorem, which in turn uses the theory of random graphs. There are many number theoretical problems out there that are still open. At the 1912 International Congress of Mathematicians, the German mathematician 3

5 Edmund Landau listed the following four basic problems about primes that still remain unresolved: 1. Can every even integer greater than 2 be written as a sum of two primes? (Goldbach s Conjecture, 1742); 2. Are there infinitely many prime numbers p and q such that p q = 2? (Twin Prime Conjecture, 1849); 3. Does there always exist a prime between two consecutive perfect squares? (Legendre s Conjecture, circa 1800); 4. Are there infinitely many primes of the form n 2 + 1? (see Bunyakovsky s Conjecture, 1857). It is widely believed that the answer to each of the questions above is yes. There is a lot of computational evidence towards each of them, and for some of them conjectural asymptotic formulas were established. However, none of them are proved. Aside from being an interesting theoretical subject, number theory also has many practical applications. It is widely used in cryptographic protocols, such as RSA (Rivest-Shamir-Adleman, 1977), the Diffie-Hellman protocol (1976), and ECIES (Elliptic Curve Integrated Encryption Scheme). These protocols rely on certain fundamental properties of finite fields (RSA, D-H) and elliptic curves defined over them (ECIES). For example, consider the Discrete Logarithm Problem: given a prime p and integers c,m, one may ask whether there exists an integer d such that c d m is divisible by p, and if so, what is its value. We may write this in the form of a congruence c d m (mod p). When p is extremely large (hundreds of digits) and c,m are chosen properly, this problem is widely believed to be intractable; that is, no modern computer can solve it in a reasonable amount of time (the computation would require billions of years). This property is used in many cryptosystems, including the first two mentioned above. Many cryptosystems, like RSA, can be broken by quantum computers. The construction of protocols infeasible to attacks by quantum computers is a subject of Post Quantum Cryptography and number theory plays a crucial role there (see the Lattice-Based or Isogeny-Based Cryptography). 4

6 2 Divisibility. Factorization of Integers. The Fundamental Theorem of Arithmetic Before we proceed, let us invoke a little bit of notation: N = {1,2,3,...} the natural numbers; Z = {0,±1,±2,...} the ring of integers; Q = { m n : m Z,n N } the field of fractions; R the field of real numbers; C = {a + bi: a,b R,i 2 = 1} the field of complex numbers. We call Z a ring because 0,1 Z and a,b Z implies a ± b Z and a b Z. In other words, Z is closed under addition, subtraction and multiplication. Note, however, that a,b Z with b 0 does not imply that a/b Z, so it is not closed under division. A collection that is closed under addition, subtraction, multiplication and division by a non-zero element is called a field. According to this definition, every field is also a ring. Exercise 2.1. Demonstrate the proper inclusions in N Z Q R C. No proofs are required. Definition 2.2. Let a,b Z. We say that a divides b, or that a is a factor of b, when b = ak for some k Z. We write a b if this is the case, and a b otherwise. Example because 12 = 3 4; 3 13; 1 7 because 7 = ( 1) ( 7); 0 3. Proposition Let a,b,c,x,y Z. 1. If a b and b c, then a c; 2. If c a and c b, then c ax ± by; 3. If c a and c b, then c a ± b; 4. If a b and b 0, then a b ; 5. If a b and b a, then a = ±b; 1 Proposition 1.2 in Frank Zorzitto, A Taste of Number Theory. 5

7 6. If a b, then ±a ±b; 7. 1 a for all a Z; 8. a 0 for all a Z; 9. 0 a if and only if a = 0. Proof. Exercise. Definition 2.5. Let p 2 be a natural number. Then p is called prime if the only positive integers that divide p are 1 and p itself. It is called composite otherwise. We remark that 1 is neither prime nor composite. We will also use the above terminology only with respect to integers exceeding 1 (so according to this convention 3 is not prime and 6 is not composite). Exercise 2.6. Among the collection 5,1,5,6, which numbers are prime? Theorem 2.7. For each integer n 2 there exists a prime p such that p n. Proof. We will prove this result using strong induction on n. Base case. For n = 2 we have 2 n. Since 2 is prime, the theorem holds. Induction hypothesis. Suppose that the theorem is true for n = 2, 3,..., k. Induction step. We will show that the theorem is true for n = k + 1. If n is prime the result holds. Otherwise there exists a positive integer d such that d n, d 1 and d n. By property 4 of Proposition 2.4 we have d n, and since d 1 and d n we conclude that 2 d n 1 = k. Thus d satisfies the induction hypothesis, so there exists a prime p such that p d. Since p d and d n, by property 1 of Proposition 2.4 we conclude that p n. Theorem 2.8. (Euclid s Theorem, circa 300BC) There are infinitely many prime numbers. Proof. Suppose not, and there are only finitely many prime numbers, say p 1, p 2,..., p k. Consider the number q = p 1 p 2 p k + 1. Since q 2, by Theorem 2.7 there exists some prime, say p i, which divides q. On the other hand, since p i p 1 p 2 p k and p i 1, by property 3 of Proposition 2.4 it is the case that p i q. This leads us to a contradiction. Hence there are infinitely many prime numbers. 6

8 There are many alternative proofs of this fact, suggested by Euler, Erdős, Furstenberg, and other mathematicians (see the wikipedia page for Euclid s Theorem). At the end of this section, we will see the proof given by Euler. We will now turn our attention to the Fundamental Theorem of Arithmetic, which states that any integer greater than 1 can be written uniquely (up to reordering) as the product of primes. Example 2.9. Number 60 can be written as 60 = In order to prove the theorem, we will utilize the following tools: 1. Well-Ordering Principle. Let S be a non-empty subset of the natural numbers N. Then S contains the smallest element. To spell it out, there exists x S such that the inequality x y holds for any y S. 2. Generalized Euclid s Lemma. 2 Let p be a prime number and a 1,a 2,...,a k be integers. If p a 1 a 2 a k, then there exists an index i, 1 i k, such that p a i. Theorem (The Fundamental Theorem of Arithmetic) Any integer greater than 1 can be written uniquely (up to reordering) as the product of primes. Proof. We will start by proving that every positive integer greater than 1 can be written as a product of primes. Let S denote the collection of all positive integers greater than 1 that cannot be written as a product of primes. Suppose that S is not empty. Since S N and N is well-ordered, we conclude that S contains the smallest element, say n. Clearly, n is not a prime. Thus there exists a positive integer d such that d n, d 1 and d n. Thus both d and n/d are strictly less than n and greater than 1. Furthermore, either d or n/d cannot be written as a product of primes, for the converse would imply that n is a product of primes. Thus either d or n/d is in S, which contradicts the fact that n is the smallest element in S. This means that S is empty, so every integer greater than 1 is a product of primes. To prove uniqueness, consider two prime power decompositions n = p a 1 1 pa 2 2 pa k k = qb 1 1 qb 2 2 qb l l. We will show that they are in fact the same. 3 Without loss of generality, we may assume that p 1 < p 2 <... < p k and q 1 < q 2 <... < q l. Pick some index i such 2 We will prove this result in Corollary 3.15 once we will introduce the notion of a greatest common divisor. 3 Note that this is not the proof by contradiction, for we do not assume that these prime power decompositions are distinct. 7

9 that 1 i k. Since p i n = q b 1 1 qb 2 2 qb l l, by Generalized Euclid s Lemma there exists some index j(i), 1 j(i) l, such that p i q b j(i) j(i). Now apply Generalized Euclid s Lemma once again to deduce that p i q j(i). Since q j(i) is prime, its only divisors are 1 and q j(i), which means that p i = q j(i). Since p 1 < p 2 <... < p k, we see that j(i 1 ) j(i 2 ) whenever i 1 i 2. From above we conclude that for each i such that 1 i k we can put in correspondence some element j(i) and each j(i) arises from unique i such that 1 j(i) l, which means that there are at least as many j s as there are i s, so k l. Apply Generalized Euclid s Lemma once again, but with the roles of p i and q j reversed, thus observing that for each j such that 1 j l we can put in correspondence some element i( j) and each i( j) arises from unique j such that 1 i( j) l, so l k. Since k l and l k, it is the case that k = l. From here we deduce that p a i i q b i i and q b i i p a i i. By property 5 of Proposition 2.4, we have p a i i = q b i i. Since p i = q i, it is the case that a i = b i. The fact that the prime factorization is unique was utilized by Euler to provide an alternative proof of Euclid s Theorem. Theorem 2.9. (Euclid s Theorem, circa 300BC) There are infinitely many prime numbers. Proof. (Euler s proof, 1700 s) Consider the harmonic series 1 n=1 n = It is widely known that this series is divergent. Now let p > 1 and recall the formula for the infinite geometric series: k=0 Using this formula, we observe that p prime 1 p k = p + 1 p = 1 1 1/p /p = p prime (1 + 1p + 1p ) = 8 1 n=1 n,

10 where the last equality holds by the Fundamental Theorem of Arithmetic. If there would be only finitely many primes, the product on the left hand side would be finite, which contradicts the fact that the series on the right hand side is divergent. 3 Greatest Common Divisor. Least Common Multiple. Bézout s Lemma. When divisibility fails, we speak of quotients and remainders. Theorem 3.1. (The Remainder Theorem) 4 Let a,b be integers, a > 0. Then there exist unique integers q and r such that where 0 r < a. b = aq + r, Proof. Recall that every real number x sits in between two consecutive integers; that is, there exists some unique integer q such that q x < q + 1. Now set x = b/a. Then from above inequality it follows that aq b < aq + a. But then If we now put r = b aq, then 0 b aq < a. b = aq + r and r satisfies 0 r < a. From the above construction it is also evident that q and r are unique, so the result follows. Definition 3.2. Let a,b be integers, a > 0. Write b = aq + r, where 0 r < a. Then a is called the modulus, b is called the dividend, q is called the quotient and r is called the remainder. 4 Proposition 1.3 in Frank Zorzitto, A Taste of Number Theory. 9

11 Note that for a > 0 the expression a b simply means that in b = aq + r the remainder r is equal to zero. Given a and b, one can easily compute q and r using the calculator. First, compute a/b, and the integer part of this expression is precisely your q. Then compute r with the formula r = b aq. Definition 3.3. Let a and b be integers. An integer d such that d a and d b is called a common divisor of a and b. When at least one of a and b is not zero, the largest integer with such a property is called the greatest common divisor of a and b and is denoted by gcd(a,b). When a = b = 0, we define gcd(a,b) := 0. The greatest common divisor of a and b possesses many interesting properties. Let us demonstrate several of them. Proposition 3.4. Let a = p e 1 1 pe 2 2 pe k k and b = p f 1 1 p f 2 2 p f k k, where p 1, p 2,..., p k are distinct prime numbers and e 1,e 2,...,e k, f 1, f 2,..., f k are integers 0. Then gcd(a,b) = p min{e 1, f 1 } 1 p min{e 2, f 2 } 2 p min{e k, f k } k. (1) Further, any common divisor c of a and b must also divide gcd(a,b). Proof. Note that divides both a and b. Also, any integer g = p min{e 1, f 1 } 1 p min{e 2, f 2 } 2 p min{e k, f k } k c = p g 1 1 pg 2 2 pg k k such that g i > min{a i,b i } for some i fails to divide either a or b. Hence any common divisor c satisfies g i min{a i,b i } for all i, 1 i k. Hence c divides g. Maximizing the inequality for each index we get that g is in fact the greatest common divisor. Note that Proposition 3.4 suggests one formula for the computation of gcd(a, b). First, one has to factor a and b by writing them in the form a = p e 1 1 pe 2 2 pe k k and b = p f 1 1 p f 2 2 p f k k, 10

12 where the indices e i and f j are allowed to be 0 (convince yourself that any two numbers can be written in this form). Then one might simply utilize the formula (1). This approach works fine when the numbers are small and easily factorable, but unfortunately as the numbers get really large the efficient factorization is infeasible for modern electronic computers (but feasible for quantum computers, see Shor s Algorithm). In fact, the security of the RSA public key cryptosystem is based on the difficulty of factorization. Example 3.5. Let us compute the greatest common divisor of 440 and 300. The prime factorizations are 440 = and 300 = We see that Thus 440 = and 300 = gcd(440,300) = 2 min{3,2} 3 min{0,1} 5 min{1,2} 11 min{1,0} = = 20. Exercise 3.6. Let a and b be integers. An integer l is called a common multiple of a and b if it satisfies a l and b l. The smallest non-negative integer with such a property is called the least common multiple of a and b and is denoted by lcm(a,b). Given the statement as in Proposition 3.4, prove that lcm(a,b) = p max{e 1, f 1 } 1 p max{e 2, f 2 } 2 p max{e k, f k } k (2) and that every common multiple c of a and b is divisible by lcm(a,b). That is, if a c and b c, then lcm(a,b) c. Exercise 3.7. Let a and b be non-negative integers. Prove that ab = gcd(a,b)lcm(a,b). (3) Exercise 3.8. Compute lcm(440,300) using formulas (2) and (3). We will now address the following question: which integers c can be written in the form ax + by, where x and y are integers? Speaking in fancy mathematical language, the identity c = ax + by means that c is an integer (linear) combination of a and b. Let us play around a little bit with the quantity ax + by. Clearly, a can be written in this form, since a = a 1+b 0. Same applies to b, since b = a 0+b 1. The number 0 can always be represented in this form, since 0 = a 0 + b 0. Note that, when at least one of a and b is not zero, ax+by will always represent at least one positive integer, because a a + b b > 0. It turns out that the least positive integer d represented by ax+by is precisely the greatest common divisor of a and b. 11

13 Example 3.9. Consider a = 7 and b = 15. Then the equation 7x + 15y = 1 has a solution (x,y) = ( 2,1). In fact, it has infinitely many solutions, as any solution of the form (x,y) = ( n,1 7n) for n Z is a solution, too. However, when a = 7 and b = 14 the equation 7x + 14y = 1 has no solutions, as the left hand side will always be divisible by 7, while this is not the case for the right hand side. So number 1 cannot be represented as an integer linear combination of 7 and 14. Hence the question: which numbers can be represented in this form? Theorem (Bézout s lemma) 5 Let a,b be integers such that a 0 or b 0. If d is the least positive integer combination of a and b, then d divides every combination of a and b. Furthermore, d = gcd(a,b). Proof. We know that ax + by = d > 0. Now consider some integer combination c = as + bt, where s,t Z. We want to show that d c. Recall that c = dq + r for some q,r Z, where 0 r < d. Thus 0 r = c dq = as + bt (ax + by)q = a(s xq) + b(t yq) < d. We see that r is an integer combination of a and b, which is less than d, and nonnegative. Because d is the least positive integer combination of a and b, the only option is that r = 0. Hence d c. In particular, d a and d b, because a,b are integer combinations of a and b. 5 Proposition 1.4 in Frank Zorzitto, A Taste of Number Theory. 12

14 We will now show that d = gcd(a,b). On one hand, we know that d a and d b, so d is a common divisor of a and b. By Proposition 3.4, d must divide the greatest common divisor of a and b, i.e. d gcd(a,b). On the other hand, since d = ax+by = gcd(a,b)(a 1 x+b 1 y) for some x,y,a 1,b 1 Z, we have gcd(a,b) d. Since d gcd(a,b) and gcd(a,b) d, by property 5 of Proposition 2.4 we conclude that d = gcd(a,b). With the help of Theorem 3.10 we can answer the question which numbers can be represented in the form ax + by. Since gcd(a,b) = ax + by for some x,y Z and gcd(a,b) is the smallest positive integer representable in this form, we see that any integer c divisible by gcd(a,b) can be written in such a way, since for some k Z it is the case that c = k gcd(a,b) = k(ax + by) = a(kx) + b(ky). On the other hand, if gcd(a,b) c, then c cannot be written as an integer combination of a and b. We will now use Bézout s lemma to establish a few more properties of prime numbers. In particular, we will prove Euclid s lemma, which we already saw in Section 2. Definition Let a and b be integers. We say that a and b are coprime if gcd(a,b) = 1. Proposition Let a,b,c be integers with a,b coprime. If a c and b c, then ab c. Proof. Since a and b are coprime, by Bézout s lemma there exist integers x and y such that ax + by = 1. Thus a(cx) + b(cy) = c. After dividing both sides of the above equality by ab we obtain c b x + c a y = c ab. Since a c and b c, the quantity on the left hand side of the above equality is an integer. Hence the same applies to the quantity on the right hand side, so c/(ab) is an integer. 13

15 Proposition Let a,b,c be integers with a,b coprime. If a bc, then a c. Proof. Since a and b are coprime, by Bézout s lemma there exist integers x and y such that ax + by = 1. Thus a(cx) + b(cy) = c. After dividing both sides of the above equality by a we obtain c x + bc a y = c a. Since a bc, the quantity on the left hand side of the above equality is an integer. Hence the same applies to the quantity on the right hand side, so c/a is an integer. Proposition (Euclid s lemma) If p is prime and p ab for some integers a, b, then p a or p b. 6 Proof. Say p a. Let d = gcd(p,a). Since d p, the definition of primes forces d = 1 or d = p, and since p a, it must be that d = 1, so p and a are coprime. From Proposition 3.13 it follows that p b. Corollary (Generalized Euclid s lemma) Let p be a prime number and a 1,a 2,...,a k be integers. If p a 1 a 2 a k, then there exists an index i, 1 i k, such that p a i. Proof. The result clearly holds for k = 1, so assume that k 2. If p a 1, we are done. If not, then p and a 1 are coprime, so by Proposition 3.13 it must be the case that p a 2 a 3 a k. If p a 2 we are done. If not, then p and a 2 are coprime, so by Proposition 3.13 it must be the case that p a 3 a 4 a k. Proceeding in the same fashion, we eventually reach p a k 1 a k, where we may apply Euclid s lemma to draw the desired conclusion. Exercise Show that one cannot remove the coprimality condition neither from Proposition 3.12 nor from Proposition ). 6 The proof is from Frank Zorzitto s A Taste of Number Theory (see Proposition 2.4 on page 14

16 4 Diophantine Equations. The Linear Diophantine Equation ax + by = c An equation is called Diophantine if we are only concerned with its integer solutions. Any equation can be converted into its Diophantine form. For example, instead of looking at x 2 + y 2 = 1 for (x,y) R 2 we may restrict our attention to (x,y) Z 2. Note that in the former case there are infinitely many solutions (in fact, there are uncountably many of them). These are all the points lying on the circle centered at the origin with the radius equal to 1. However, if we look at (x,y) Z 2 then there are only four solutions, namely (±1,0) and (0,±1). (Do you see why?) Sometimes, converting an equation into its Diophantine form is not very interesting. This is the case for the equation x 2 + y 2 = 1. Another example is the equation y = x 2, which has no integer solutions aside from (0,0) due to irrationality of 2. But sometimes understanding integer solutions can get difficult, even extremely difficult. The reason is that, when considering an equation over the real numbers R or even better! over the complex numbers C, there are many analytical tools that we can utilize. Say, if we are looking at equation f (x) = 0 for x R, we might utilize the fact that f (x) is continuous, or differentiable, or maybe even smooth. Another reason why it might be easier to analyze equations not only over R or C, but also over Q, is because all of them are fields. Quite often we can say many things about the Diophantine equation by lifting it and considering it, for example, over Q. For example, if we can show that there are finitely many solutions over Q, then there are finitely many solutions over Z as well. This applies to hyperelliptic equations, like y 2 = x (see Faltings Theorem). However, sometimes there are infinitely many solutions over Q, but only finitely many or even none! over Z. For example, the fact that Q is a field can be utilized to prove that there are infinitely many rational solutions to elliptic equations y 2 = x , y 2 = x 3 2. For example, the first equation has a solution ( 7/4, 51/8), while the second equation has a solution (129/100,383/1000). Unlike Q,R or C, the ring of integers Z is not closed under division by a non-zero element, so we need to use different techniques to analyze it. For example, the equation y 2 = x has no solutions in integers, while the equation y 2 = x 3 2 has two solutions (3,±5). 15

17 Example 4.1. Let a,b,c,n be fixed integers, n 3, and x,y,z be integer variables. Here are a few examples of Diophantine equations: ax + by = c x 2 + y 2 = z 2 x 2 dy 2 = ±1 y 2 = x 3 + ax + b ax n + by n = c ax n + by n = cz n x = 2 y Linear Diophantine equation in two variables; Pythagorean equation; Pell equation; Weierstrass equation; Thue equation; Fermat type equation; Ramanujan-Nagell equation. When analyzing equations, we would like to answer the following questions: 1. Do solutions exist? 2. If solutions exist, how many of them are there? (finitely many, countably many, uncountably many) 3. What are the solutions? 4. Are there algorithms that allow to produce solutions? We address the same questions when analyzing Diophantine equations. Of course, in this case the number of solutions will be at most countable. We will now turn our attention to the linear Diophantine equation in two variables ax + by = c. Here a,b,c are fixed integers and x,y are integer variables. We will fully classify the solutions to this equation. The question of existence of a solution was fully resolved at the end of Section 3, where we established that solutions exist if and only if gcd(a,b) c. To this end, the only thing that is left for us to do is to find all the solutions when they exist, and come up with a procedure for their computation. As the following Proposition shows, by knowing one solution to ax+by = c we can deduce all of the solutions. Proposition 4.2. Let a,b,c be integers. Let (x,y) be a pair of integers such that ax + by = c. 16

18 Then any pair of integers (x,y ) such that c = ax + by must be of the form ( ) (x,y b ) = x gcd(a,b) n,y + a gcd(a,b) n, where n ranges over the integers. Proof. Suppose that (x,y) and (x,y ) are integer pairs such that c = ax + by = ax + by. Then a(x x ) = b(y y). This means that a b(y y), and further This means that a gcd(a,b) (y y). y a = y + n gcd(a, b) for some n Z. Substituting this relation into the equation a(x x ) = b(y y), we see that a(x x ab ) = n gcd(a,b), which means that x b = x n gcd(a,b). Thus we see that from one solution to ax+by = c (if it exists) we may produce all solutions once we compute gcd(a,b). In order to determine one solution to this equation, we use the Extended Euclidean Algorithm. This algorithm allows one to compute a pair of integers (x,y) such that ax + by = gcd(a,b). This allows us to produce a solution to ax+by = c, as then it must be the case that gcd(a,b) c, so for some integer k we have c = k gcd(a,b) = k(ax + by) = a(kx) + b(ky). We may then use Proposition 4.2 to compute all solutions to the linear Diophantine equation ax + by = c. We will learn about the Extended Euclidean Algorithm in the following section. 17

19 Exercise 4.3. Let a 1,a 2,...,a k be integers at least one of which is not 0. The largest integer d such that d a i for all i, 1 i k, is called the greatest common divisor of a 1,a 2,...,a k. It is denoted by gcd(a 1,a 2,...,a k ). When a 1 = a 2 =... = a k = 0, we define gcd(a 1,a 2,...,a k ) := 0. Determine the formulas for gcd(a 1,a 2,...,a k ) and lcm(a 1,a 2,...,a k ) that are analogous to (1) and (2). Does a formula similar to (3) hold? Explain why or why not. Exercise 4.4. Let a 1,a 2,...,a k be integers. We say that c Z can be represented as an integer linear combination of a 1,a 2,...,a k if there exist x 1,x 2,...,x k Z such that c = a 1 x 1 + a 2 x a k x k. Given integers a 1,a 2,...,a k, which integers can be written as an integer combination of a 1,a 2,...,a k? 5 Euclidean Algorithm. Extended Euclidean Algorithm Let a,b be integers at least one of which is not 0. In the previous section, we found one formula for the computation of gcd(a, b), namely (1). Though being useful, it is not very efficient, as the algorithm for fast integer factorization is unknown. 7 However, there exists a much more efficient algorithm to compute gcd(a,b), developed by Euclid in his fundamental work Elements. It is called the Euclidean Algorithm. We begin our explorations by first showing yet another interesting property of the greatest common divisor. In particular, if a,b are integers at least one of which is not zero, then gcd(a,b) does not change if we replace b with b + ak, where k is an arbitrary integer. Proposition 5.1. Suppose a,b are two integers. Then for any integer k it is the case that gcd(a,b) = gcd(a,b + ak). Proof. Let d 1 = gcd(a,b) and d 2 = gcd(a,b + ak). We will show that d 1 d 2 and d 2 d 1. 7 By fast we mean polynomial time. 18

20 Since d 1 a and d 1 b, it is the case that d 1 (b + ak). Since d 1 is a common divisor of a and b + ak, by Proposition 3.4 it must divide their greatest common divisor d 2. Thus d 1 d 2. Now observe that d 2 a and d 2 b + ak. Thus a = d 2 r 1 and b + ak = d 2 r 2 for some r 1,r 2 Z. But then b = d 2 r 2 ak = d 2 r 2 d 2 r 1 k = d 2 (r 2 r 1 k). Hence d 2 b, which means that d 2 is a common divisor of a and b. By Proposition 3.4 it must divide their greatest common divisor d 1. Thus d 2 d 1. Since d 1 d 2 and d 2 d 1, we conclude that d 1 = d 2. We will now describe the Euclidean Algorithm. Let a, b be positive integers such that ab 0, since when ab = 0 it is easy to compute gcd(a,b). Without loss of generality, we suppose that a > b (if a < b we may interchange a and b, and if a = b then gcd(a,b) = a). We define the finite sequence of integers a 1,a 2,... as follows. Set r 1 = a, r 2 = b, and write r 1 = q 1 r 2 + r 3. Note that the remainder r 3 satisfies 0 r 3 < r 2 = b. Then compute r 2 = q 2 r 3 + r 4, r 3 = q 3 r 4 + r 5, and so on. Since the sequence of integers r 1 > r 2 >... is bounded below by 0, in n steps this sequence eventually reaches some smallest positive number r n. We will show that this smallest positive integer r n is precisely gcd(a,b). Why does this process allow one to compute gcd(a,b)? By Proposition 5.1, Let us compute one more step: gcd(r 1,r 2 ) = gcd(r 1 q 1 r 2,r 2 ) = gcd(r 3,r 2 ). gcd(r 3,r 2 ) = gcd(r 3,r 2 q 2 r 3 ) = gcd(r 3,r 4 ). Proceeding in the same fashion, we see that gcd(a,b) = gcd(r 1,r 2 ) = gcd(r 2,r 3 ) =... = gcd(r i,r i+1 ) for all i such that 1 i n 1. We see that the calculations get easier with each step, and in the end we obtain gcd(a,b) = gcd(r 1,r 2 ) =... = gcd(r n 1,r n ) = gcd(r n,0) = r n. 19

21 Theorem 5.2. Let a,b be positive integers with a > b. Let r 1 > r 2 >... be the finite sequence as defined above. Let r n be the smallest positive integer in this sequence. Then r n = gcd(a,b). Proof. Recall that d = gcd(a,b) = gcd(r i,r i+1 ) for i = 1,2,...,n 1. Now consider the last equation r n 2 = q n 2 r n 1 + r n. The remainder in the expression r n 1 = q n 1 r n + r n+1 satisfies 0 r n+1 < r n. Since r n is the smallest positive integer in this sequence and the sequence is strictly decreasing, the only possibility is that r n+1 = 0, which means that r n divides r n 1. But then r n = gcd(r n 1,r n ) = gcd(r n 2,r n 1 ) =... = gcd(r 1,r 2 ) = gcd(a,b). Consider several examples. Example 5.3. Let us compute gcd(440,300) using the Euclidean Algorithm. We have 440 = = = Thus gcd(440, 300) = 20. Example 5.4. Let us compute gcd(233,144) using the Euclidean Algorithm. We have 233 = = = = = = = = = = = Thus gcd(233, 144) = 1. 20

22 Note that both numbers in Example 5.4 are smaller than in Example 5.3. Nevertheless, in Example 5.4 the Euclidean Algorithm terminated in 12 steps, while in Example 5.3 it terminated in 3 steps. This is because in Example 5.4 we chose our integers to be the 13th and the 12th Fibonacci numbers. Recall that Fibonacci numbers are the numbers defined recursively by F 1 = 1, F 2 = 2 and F n = F n 1 + F n 2 for n 3. It turns out that the slowest performance of the Euclidean Algorithm is achieved for consecutive Fibonacci numbers. Nevertheless, the algorithm does work in polynomial time. In 1844, Gabriel Lamé proved that the number of steps required for the completion of the Euclidean Algorithm is at most 5log 10 (min{a,b}), so we see that the algorithm works in polynomial time. Exercise 5.5. Let F 1 = 1, F 2 = 2, and for an integer n 3 define F n = F n 1 + F n 2. The number F n is called the n-th Fibonacci number. Prove that the computation of gcd(f n+1,f n ) with the Euclidean Algorithm requires n steps. Above we managed to compute gcd(a,b). Still, we do not know how to produce integer solutions (x,y) to the Diophantine equation ax + by = gcd(a,b). This can be achieved with the help of the Extended Euclidean Algorithm. It is essentially the same as the Euclidean Algorithm, but along with the sequence r 1,r 2,... we will also keep track of two additional sequences s 1,s 2,... and t 1,t 2,... The algorithm is as follows. Set For i 3, we proceed by computing r 1 = a, r 2 = b; s 1 = 1, s 2 = 0; t 1 = 0, t 2 = 1. r i+1 = r i 1 q i 1 r i ; s i+1 = s i 1 q i 1 s i ; t i+1 = t i 1 q i 1 t i. Note that, out of the three lines above, the Euclidean Algorithm computes only the first one. We claim that, if the Euclidean Algorithm terminates in n + 1 steps, then integers s n and t n satisfy as n + bt n = gcd(a,b). Theorem 5.6. Let a,b be positive integers with a > b. Let r 1 > r 2 >... > r n > 0, s 1,s 2,...,s n and t 1,t 2,...,t n be sequences as defined above. Then as n + bt n = gcd(a,b). 21

23 Proof. We claim that the equation as i + bt i = r i is satisfied for all i = 1,2,...,n. Since Theorem 5.2 asserts that r n = gcd(a,b), this would imply the result. To prove this statement, we proceed using induction on n. Base case. According to our setup, r 1 = a, r 2 = b, s 2 = t 1 = 0 and s 1 = t 2 = 1. Thus as 1 + bt 1 = r 1 and as 2 + bt 2 = r 2, so the base case holds for i = 1,2. Induction hypothesis. Assume that as i + bt i = r i for i = k 1,k. Induction step. We will demonstrate that the result holds for i = k + 1: r k+1 = r k 1 r k q k = (as k 1 + bt k 1 ) (as k + bt k )q = (as k 1 as k q k ) + (bt k 1 bt k q k ) = as k+1 + bt k+1. We conclude that as i + bt i = r i for all i = 1,2,...,n, as claimed. Using Extended Euclidean Algorithm, we can finally solve the Diophantine equation ax + by = c. Example 5.7. Let us determine all solutions to the Diophantine equation 440x + 300y = 80 using the Extended Euclidean Algorithm. Set r 1 = 440, r 2 = 300; s 1 = 1, s 2 = 0; t 1 = 0, t 2 = 1. Step = , so q 1 = 1 and r 3 = 140. Thus s 3 = s 1 q 1 s 2 = = 1; t 3 = t 1 q 1 t 2 = = 1. Step = , so q 2 = 2 and r 4 = 20. Thus s 4 = s 2 q 2 s 3 = = 2; t 4 = t 2 q 2 t 3 = 1 2 ( 1) = 3. 22

24 Step 3. Since , the algorithm terminates. We conclude that 440 ( 2) = 20. After multiplying both sides of the above equality by 4, we obtain a solution (x, y) = ( 8, 12) to the Diophantine equation 440x + 300y = 80. By Proposition 4.2, if a = 440 and b = 300 then all solutions to this Diophantine equation must be of the form ( x b gcd(a,b) n,y + where n ranges over the integers. ) a gcd(a,b) n = ( 8 15n, n), Exercise 5.8. (a) Let a,b,c be integers such that a 0 or b 0, and gcd(a,b) c. Consider the Diophantine equation ax + by = c. Prove that there exists the unique solution (x, y) such that 0 x < b/ gcd(a, b) and the unique solution (x,y ) such that 0 y < a/gcd(a,b); (b) Let (x,y) := x 2 + y 2 denote the Euclidean norm of (x,y). Given a solution (x,y) to the Diophantine equation ax + by = c, determine the formula for the integer solution (x,y ) with the smallest Euclidean norm. 6 Congruences. The Double-and-Add Algorithm Throughout this section, we fix a positive integer n, which we call the modulus. Definition 6.1. We say that integers a and b are congruent modulo n if n divides a b. We denote this by a b (mod n). To say that a and b are congruent modulo n is the same as to say that their remainders after division by n are the same. That is, if a = q 1 n + r 1 and b = q 2 n + r 2, where 0 r 1,r 2 < n, then r 1 = r 2. A rather surprising fact is that the congruence relation behaves much like the equality relation =. 23

25 Proposition 6.2. The congruence relation is an equivalence relation. That is, it satisfies the following three axioms: (a) Reflexivity. If a is any integer, then a a (mod n); (b) Symmetry. If a b (mod n), then b a (mod n); (c) Transitivity. If a b and b c (mod n), then a c (mod n). Proof. Exercise. Example 6.3. Let n = 5. Then the numbers 7 and 27 are congruent to each other modulo 5, because 5 (27 7). Also note that both 7 and 27 have the same remainder after division by 5: 7 = and 27 = In fact, it is easy to notice that there are infinitely many numbers congruent to 7 modulo 5. Convince yourself that all of them belong to the set {5q + 2: q Z} =..., 8, 3,2,7,12,... Proposition Let n be a modulus, and suppose that a a 1 b b 1 (mod n), (mod n). Then a ± b a 1 ± b 1 (mod n), ab a 1 b 1 (mod n). Proof. Let us first show that a + b a 1 + b 1 (mod n). Note that n (a a 1 ) and n (b b 1 ). By property 2 of Proposition 2.4, n (a a 1 ) + (b b 1 ) = (a + b) (a 1 + b 1 ), so by definition we see that a + b a 1 + b 1 (mod n). An analogous proof holds if we replace the plus sign with the minus sign. To see that ab a 1 b 1 (mod n), observe that ab a 1 b 1 = ab a 1 b + a 1 b a 1 b 1 = (a a 1 )b + a 1 (b b 1 ). 8 Proposition 3.3 in Frank Zorzitto, A Taste of Number Theory. 24

26 Since n (a a 1 ) and n (b b 1 ), once again, by property 2 of Proposition 2.4 it is the case that n (a a 1 )b + a 1 (b b 1 ) = ab a 1 b 1, and by definition this means that ab a 1 b 1 (mod n). If we now combine Propositions 6.2 and 6.4, it becomes clear that in any congruence, which involves only addition, subtraction and multiplication of integers, we can easily replace a with a 1 whenever a a 1 (mod n). This is known as the replacement principle. Example 6.5. Let f (x) = x 5 10x + 7. We can compute the remainder of f (27) divided by 5 as follows: note that 27 2 (mod 5). Since f (x) involves only addition, subtraction and multiplication of integers, by the replacement principle we can compute f (2) instead of f (27), because f (27) f (2) (mod 5). Also, since 10 0 (mod 5) and 7 2 (mod 5), we see that f (27) f (2) (mod 5). Since 0 4 < 5, we conclude that 4 is the remainder of f (27) divided by 5. Example 6.6. Let us compute the last decimal digit of Note that this is the same as finding the remainder of divided by 10. By the replacement principle, reading from left to right and top to bottom, we have (7 3 ) (3 3 ) 11 (27) (7 3 ) (mod 10). Thus 3 is the last decimal digit of Analogously, we can determine the last k decimal digits of any number by applying the replacement principle modulo 10 k instead of 10. However, as the modulus grows, the computations become more and more challenging. 25

27 In practice, in order to compute a l (mod n) for some large power l, we utilize the so called Double-and-Add Algorithm. The algorithm is as follows: first, write the integer l in its binary expansion, i.e. l = k i=0 where c i {0,1}. Then c i 2 i = c k 2 k + c k 1 2 k c c 0, a l a c k2 k +c k 1 2 k c 1 2+c 0, ( a 2k) c k ( a 2k 1) c k 1 ( a 2 ) c1 a c 0 (mod n). But then note that, for j such that 2 j k, we can deduce the value of a 2 j from a 2 j 1 modulo n as follows: a 2 j (a 2 j 1) 2 (mod n). Therefore we can compute a 2,a 22,...,a 2k in k 1 steps. Example 6.7. Let us compute n (mod 23) such that 0 n < 23. Note that Then 114 = = (mod 23); 7 4 (7 2 ) (mod 23); 7 8 (7 4 ) (mod 23); 7 16 (7 8 ) (mod 23); 7 32 (7 16 ) (mod 23); 7 64 (7 32 ) (mod 23). We can utilize the table above in our calculations: (mod 23). 26

28 We will now take a look at some interesting applications of modular arithmetic. For example, it can be used to demonstrate that certain Diophantine equations have no solutions. Example 6.8. Let us show that the Diophantine equation x 2 + y 2 = 4z + 3 has no solutions. This is the same as solving the congruence x 2 + y 2 3 (mod 4) in integers x and y. Since every integer is congruent to either 0,1,2 or 3 modulo 4, there are essentially 16 possible combinations of x and y that we can check. However, the problem becomes even simpler if we note that 0 2 0, 1 2 1, 2 2 0, (mod 4). Thus every perfect square is congruent to either 0 or 1 modulo 4. Since we are dealing with the sum of two perfect squares, there are now only three options left to check, namely , , (mod 4). As we can see, none of them add up to 3, which means that x 2 + y 2 3 (mod 4) has no solutions in integers x and y. Therefore there are no solutions to the Diophantine equation x 2 + y 2 = 4z + 3. Exercise 6.9. (a) Show that the Diophantine equation x 2 + y 2 + z 2 = 8t + 7 has no solutions for x, y, z,t Z; (b) Let Z[ 2] := {a + b 2: a,b Z}. Show that there exists a solution to x 2 + y 2 + z 2 = 8t + 7 for x,y,z,t Z[ 2]; (c) Show that integers x,y,z,t satisfy x 2 + y 2 + z 2 = 8t + 3 if and only if x, y and z are odd. In school, you probably heard of divisibility rules for various integers. For example, in order to check that some integer is divisible by 3, one just has to add up all of its decimal digits together and verify that the resulting number is divisible by 3. To verify that some integer n is divisible by 4, one just has to ensure that the number representable by the last two decimal digits of n is divisible by 4. These divisibility rules are the consequences of modular arithmetic. 27

29 Example Let us prove the following divisibility rule for 3 and 9. Let n be a positive integer, and let m be the sum of the decimal digits of n. Then 3 n if and only if 3 m, and 9 n if and only if 9 m. Let us prove the divisibility rule for 3, as the divisibility rule for 9 is analogous to it. We write the number n in base 10: n = k i=0 a i 10 i, where a i {0,1,...,9}. Then, by definition, Since 10 1 (mod 3), m = a k + a k a 1 + a 0. n a k 10 k + a k 1 10 k a a 0 a k 1 k + a k 1 1 k a a 0 a k + a k a 1 + a 0 m (mod 3). We conclude that 3 (n m), so there exists an integer k 1 such that n m = 3k 1. Now assume that 3 m. Then there exists an integer k 2 such that m = 3k 2. But then 3k 1 = n m = n 3k 2 implies n = 3(k 1 + k 2 ), which means that 3 n. Analogously, we can show that if 3 n, then 3 m. If we replace the modulus 3 with the modulus 9, the proof will remain the same. Exercise Prove the following divisibility rule for 11. Let n be an integer. Let m be the sum of the digits of n in blocks of two from right to left. Then 11 n if and only if 11 m. Example: If n = , then m = = 264 is divisible by 11. Thus is divisible by 11 as well. 7 The Ring of Residue Classes Z n Recall that, according to our terminology, the set of all integers Z forms a ring, if 0,1 Z and for all a and b in Z we have a ± b Z and a b Z. Now let n 28

30 be a modulus. In this section, we will introduce the first example of a finite ring Z n and study its properties. As the name suggests, this ring will have only finitely many elements. Just like the ring of integers Z, it will contain its own analogues of 0 and 1, and we will also endow it with the operations of addition, subtraction and multiplication, which will be very much similar to the operations in Z. Definition 7.1. Let a be an integer. The set [a] := {nq + a: q Z} is called the residue class of a modulo n. The integer a is called a representative of the residue class [a]. Note that [a] = [b] if and only if a b (mod n). Also, each residue class contains an integer r such that 0 r < n. It is conventional to pick such integers as representatives. For example, if n = 5, even though one can denote the set of all integers congruent to 17 modulo 5 by [17], we would rather prefer to use [2] instead, since 17 2 (mod 5) and 0 2 < 5. Since there are only n possible numbers between 0 and n (exclusive), namely 0,1,2,...,n 1, and each integer is congruent modulo n to exactly one of these numbers, we see that there are exactly n residue classes modulo n. Exercise 7.2. Let n be a positive integer. Prove that the residue classes [0],[1],..., [n 1] modulo n partition the integers. That is, [0] [1]... [n 1] = Z, and also [a] [b] implies [a] = [b]. Hint: use Proposition 6.2. We denote the collection of residues modulo n by Z/nZ or Z n. 9 Since the notation Z n is utilized in your course notes, we will stick with it in these lecture notes. Proposition 7.3. Let n be a positive integer and consider the collection Z n of all residues modulo n. Define the binary operations +, and as follows: [a] ± [b] := [a ± b] and [a] [b] := [a b]. Then, under these binary operations, Z n forms a ring. 9 The latter notation might be ambiguous, as when p is prime the symbol Z p is used to represent the ring of p-adic integers. 29

31 Proof. Exercise. Hint: use Proposition 6.4. Note that Z n is a finite ring. When we carry out operations in Z n, we are doing modular arithmetic. To do modular arithmetic, just carry out the regular arithmetic and then replace the result with any other integer modulo n (once again, conventionally we pick a representative r such that 0 r < n). Example 7.4. Here are two examples of a modular arithmetic in Z 17 : [33] + [12] = [16] + [12] = [28] = [11]. [11] [19] = [11] [2] = [22] = [5]. Note that, in the case of addition, one could slightly simplify the computations by noting that 33 1 (mod 17): [33] + [12] = [ 1] + [12] = [11]. After all, dealing with 1 is much simpler than with 16. Despite the fact that Z n behaves much like Z, some of its properties might be rather unpleasant. For example, Z has no zero divisors apart from 0. In other words, the identity ab = 0 implies that either a = 0 or b = 0. In general, this is not true for Z n. Example 7.5. To see that Z 6 contains zero divisors that are [0], note that [2] [3] = [6] = [0] = [2] [0]. Thus we see that [2] [3] = [0] in Z 6, even though [2] [0] and [3] [0]. The same is true for Z 15 : [3] [5] = [15] = [0] = [3] [0]. Thus we see another major difference between Z and Z n : in Z, the expression ab = ac with a 0 implied b = c. However, in general, this is no longer true for Z n. It is not difficult to show that, in fact, Z n has no non-trivial zero divisors if and only if n is prime or n = 1. 30

32 8 Linear Congruences Let n be a modulus. We will now turn our attention to equations in Z n. Let a,b be integers, and consider the linear equation where x is an unknown integer. Example 8.1. The linear equation [a][x] = [b], [7][x] = [3] has only one solution in Z 13, namely [x] = [6]. As there are only finitely many possibilities, we may check all of them, from [0] to [12], in order to find a solution. Even though there is only one solution in Z 13, there are actually infinitely many solutions in Z. This is because any integer y [6], that is, any integer of the form y = 13q + 6, satisfies The linear equation 7y 3 (mod 13). [3][x] = [6] has two solutions in Z 9, namely [x] = [2] and [x] = [5]. Here we see the principal difference between the linear equation in Z n and the linear equation cx = d in Z: the only way cx = d can have more than one solution is if c = d = 0. Finally, the equation [3][x] = [7] has no solutions in Z 9. Once again, we can easily verify this by plugging in all the possible values of [x] = [0],[1],...,[8]. It turns out that the tools that we have in our hands right now can help us to solve the linear congruence easily. Observe that [a][x] = [ax] = [b], and this is the same as solving the congruence ax b (mod n). 31

33 Now by definition, n has to divide ax b, so there exists an integer y such that ax b = n( y). In other words, the linear congruence [a][x] = [b] has a solution if and only if the Diophantine equation ax + ny = b has a solution in integers x and y. From what we have learned in Section 3, it immediately follows that the linear equation [a][x] = [b] has no solutions if and only if gcd(a,n) b (verify that this is the case for the last two equations in Example 8.1). When the solutions exist, we can use the Extended Euclidean Algorithm to find them. Example 8.2. Let us consider the linear equation [440][x] = [80] in Z 300. From Example 5.7 we know that the solutions to in integers x and y are of the form 440x + 300y = 80 x = n and y = 12 22n, where n is an integer. Thus [440][ 8+15n] = [80] in Z 300. By evaluating n at n = 1,2,..., 20 we obtain 20 distinct solutions in Z 300, namely [7],[22],[37],...,[292]. Note that gcd(440,300) = 20 and there are 20 distinct solutions. In Exercise 8.3, you are asked to prove that this phenomenon holds in general. Exercise 8.3. Let n 1 be a modulus, a,b be integers such that a 0. Prove that, if gcd(a,n) b, then the total number of distinct residue classes satisfying [a][x] = [b] is equal to gcd(a,n). 9 The Group of Units Z n Let n be a modulus and consider the finite ring Z n of residues modulo n. Recall that, in general, the ring Z n does not enjoy the property that if [a][b] = [a][c] and [a] 0 then [b] = [c] (see Example 7.5). However, for special values of [a] called units this cancellation law actually holds. 32

34 Definition 9.1. The residue class [a] in Z n is called a unit if there exists a solution to [a][x] = [1] in Z n. If [a] is a unit, we say that any integer b [a] is invertible modulo n. Proposition 9.2. The following statements are equivalent: 1. [a] is a unit; 2. For all integers b and c, [a][b] = [a][c] implies [b] = [c]; 3. a and n are coprime. Proof. Let us prove that 1 implies 2. Since [a] is a unit, there exists an integer x such that [a][x] = [1]. Now suppose that [a][b] = [a][c] for some integers b and c. Then [x][a][b] = [x][a][c]. Since Z n is a commutative ring, we see that [x][a] = [a][x] = [1]. Thus the above equality simplifies to [1][b] = [1][c], and this implies [b] = [c]. To prove that 2 implies 3, suppose that the statement is false and a and n are not coprime. WIthout loss of generality, we may assume that 0 a < n. Then there exists an integer p > 1 such that a = pk 1 and n = pk 2 for some integers k 1 and k 2. Since p > 1, we conclude that 1 k 2 < n, which in turn implies k 1 0 (mod n). But then ak 2 = pk 1 k 2 = pk 2 k 1 = nk 1 0 a 0 (mod n). Thus we see that [a][k 2 ] = [a][0], even though [k 2 ] [0]. This contradicts our assumption, so a and n are coprime. Finally, let us demonstrate that 3 implies 1. Since a and n are coprime, by Bézout s lemma there exist integers x and y such that ax+ny = 1. This means that [a][x] = [1], so by Definition 9.1 the residue class [a] is a unit. Corollary 9.3. Let [a] be a unit in Z n. [a][x] = [b] has a unique solution. Then for any integer b the equation 33

35 Proof. Suppose that there are two solutions [x] and [y], so [a][x] = [b] = [a][y]. By property 2 of Proposition 9.2, the identity [a][x] = [a][y] implies [x] = [y]. Note that the statements of Proposition 9.2 and Corollary 9.3 can be translated from the language of residue classes to the language of congruences. For example, property 1 simply states that ax 1 (mod n), while property 2 states that ab ac (mod n) implies b c (mod n). Finally, Corollary 9.3 implies that the congruence ax b (mod n) has a unique solution such that 0 x < n, and all integer solutions to this congruence must be of the form x + nq for q Z. Proposition 9.4. If p is prime and [a] [0] in Z p, then [a] is a unit. Furthermore, Z p has no zero divisors apart from [0] itself. Proof. Since [a] [0], without loss of generality we may assume that 1 a < p. Note that this implies that a and p are coprime, for otherwise gcd(a, p) = d > 1 would imply d = p. But then p = d < a and a < p at the same time, a contradiction. Since gcd(a, p) = 1, by Bézout s lemma there exist integers x and y such that ax + by = 1. But then [a][x] = [1], so by Definition 9.1 the residue class [a] must be a unit in Z p. Since every unit obeys the cancellation law stated in property 2 of Proposition 9.2, it follows that Z p has no zero divisors apart from [0] itself. Definition 9.5. Let [a] be a unit in Z n. The element [x] satisfying [a][x] = [1] is called an inverse of Z n and is denoted by [a] 1. When translated to the language of congruences, the fact that a is invertible modulo n implies the existence of an integer which we denote by a 1 such that a a 1 1 (mod n). Definition 9.6. The set of all units of Z n is called the group of units of Z n and is denoted by Z n. Proposition 9.7. The set of all units of Z n forms a group under the operation of multiplication. That is, it satisfies the following four group axioms: 1. Closure. For all [a],[b] Z n, [a] [b] Z n; 2. Associativity. ([a] [b]) [c] = [a] ([b] [c]); 34

36 3. Identity element. For all [a] in Z n, the element [1] satisfies [a] [1] = [1] [a] = [a]; 4. Inverse element. For each [a] in Z n there exists an element [a] 1 in Z n such that [a] [a] 1 = [a] 1 [a] = [1]. Furthermore, the group of units Z n is finite and Abelian: Abelianness. For all [a],[b] Z n, [a] [b] = [b] [a]; 6. Finiteness. There are only finitely many elements in Z n. Proof. Exercise. Example 9.8. Let us compute Z 10. By Proposition 9.2, it suffices to find all integers m, 0 m < 10, that are coprime to 10. Thus Z n = {1,3,7,9}. To convince ourselves that Z 10 is closed under the operation of multiplication, let us construct the multiplication table: We can see that all of the elements in the multiplication table are indeed in Z 10. Furthermore, we see that each row, as well as each column in this table is just a result of permutation of 1,3,7 and 9. In the future, we will see that this is not a coincidence. 10 Euler s Theorem and Fermat s Little Theorem We will now prove our first non-trivial result the Euler s Theorem. Definition Let ϕ(n) denote the number of integers m such that 0 m < n and gcd(m,n) = 1. The function ϕ is called the Euler s totient function. 10 In the context of groups, it is conventional to use the word Abelian instead of commutative. 35

37 Exercise Let #X denote the cardinality of a set X. Let n be a modulus. Prove that ϕ(n) = #Z n. Theorem (Euler s Theorem) If [a] Z n, then [a] ϕ(n) = [1]. Proof. 11 Let k = ϕ(n). Let [1] = [u 1 ],[u 2 ],...,[u k ] be the complete list of residues of Z n. Since Z n is a group, all the elements [a] [u 1 ],[a] [u 2 ],...,[a] [u k ] are in Z n. Furthermore, no element appears in this list twice, for if [a] [u i ] = [a] [u j ] for some i j, then [u i ] = [u j ] by property 2 of Proposition 9.2. Hence the second list is just a permutation of [u 1 ],[u 2 ],...,[u k ]. Thus [u 1 ] [u 2 ] [u k ] = ([a] [u 1 ]) ([a] [u 2 ]) ([a] [u k ]). Since Z n is an Abelian group, we can rearrange the order of multiplication in order to obtain [u 1 ] [u 2 ] [u k ] = [a] k [u 1 ] [u 2 ] [u k ]. Finally, we refer to property 2 of Proposition 9.2 to cancel the unit [u 1 ] [u 2 ] [u k ], and conclude that [a] k = [1]. In the language of congruences, Euler s Theorem translates to a ϕ(n) 1 (mod n) for every integer that is invertible modulo n. Example Let us prove that 1223 divides This become evident once we note that ϕ(1223) = 1222 and gcd(1223,623) = 1 (so [623] is a unit in Z 1223 ). By Euler s Theorem, (mod 1223), which means that 1223 divides Theorem 3.16 in Frank Zorzitto, A Taste of Number Theory. 36

38 Corollary (Fermat s Little Theorem) Let p be prime. Then for any integer a such that p a it is the case that [a] p 1 = [1]. In other words, a p 1 1 (mod p). Proof. Note that for any integer a such that 1 a < p it is the case that gcd(a, p) = 1. Thus [a] is a unit in Z p and ϕ(p) = p 1. The result then follows from Euler s Theorem. The theorems of Euler and Fermat give us a useful tool for raising integers to high powers modulo n. Proposition If n is a modulus, a is coprime to n, and k, l are non-negative integers such that k l (mod ϕ(n)), then a k a l (mod n). Proof. Say k l. We are given that l = qϕ(n) + k for some q 0. Then, by Euler s Theorem, ( a l = a qϕ(n)+k = a ϕ(n)) q a k 1 q a k = a k (mod n). Example Let us compute modulo 33. Note that ϕ(33) = 20. Since gcd(17,33) = 1, by Euler s theorem it first makes sense to reduce modulo 20. We can apply Euler s Theorem again here. Note that ϕ(20) = 8, and since gcd(7,8) = 1 we can see that (mod 20). But then, by Proposition 10.6, = (mod 20). Thus (mod 33). Exercise Compute the integer n, 0 n < 55, such that n (mod 55). 12 Proposition 3.20 in Frank Zorzitto, A Taste of Number Theory. 37

39 11 The Chinese Remainder Theorem Now that we know how to solve linear congruences, let us try to understand how to work with systems of congruences. Since the congruence relation behaves much like the equality relation =, solving a system of linear congruences with a single modulus would be very similar to solving a system of linear equations, which we already know how to handle through the methods of linear algebra. On the other hand, if we consider different systems of different moduli, things might get interesting. We will merely consider the most simple example of such systems, namely x a 1 (mod n 1 ), x a 2 (mod n 2 ),... x a k (mod n k ), where a 1,a 2,...,a k are integers and n 1,n 2,...,n k are positive integers greater than 1 that are pairwise coprime. Our goal here is to determine x, which satisfies all of the k congruences above. The existence of such an x is asserted by the Chinese Remainder Theorem. Before proceeding to its statement, let us recall Proposition 3.12 and the following consequence of it. Proposition Let m and n be integers greater than 1 that are coprime. Then the congruence a b (mod mn) is true if and only if both of the congruences are true. a b (mod m), a b (mod n) Proof. Suppose that a b (mod mn). Then mn (a b). But then m (a b) and n (a b) so, by definition, a b (mod m) and a b (mod n). To prove the converse, suppose that a b (mod m) and a b (mod n). Then m (a b) and n (a b). Since gcd(m,n) = 1, we may apply Proposition 3.12 to conclude that mn (a b). Thus a b (mod mn). 38

40 Theorem (The Chinese Remainder Theorem) 13 If m, n are coprime moduli and a, b are any integers, then the congruences x a (mod m), x b (mod n) have a common solution x. Furthermore, any two solutions x,y to this pair of congruences must be such that x y (mod mn). Proof. Since m and n are coprime, by Bézout s lemma the equation mt ns = b a can be solved integers s and t. Thus mt +a = ns +b = x. Note that x a (mod m) and x b (mod n), which makes it a solution to both congruences. If y is another solution to the system of congruences, then x y x y (mod m), (mod n). By Proposition 11.1, we conclude that x y (mod mn). We can easily generalize this result to arbitrary number of coprime moduli. Theorem (Generalized Chinese Remainder Theorem) 14 Suppose n 1,n 2,...,n k are moduli that are pairwise coprime. That is, n i and n j are coprime when i j. If a 1,a 2,...,a k are integers, then there exists an integer x such that x a 1 (mod n 1 ), x a 2 (mod n 2 ),... x a k (mod n k ). Furthermore, if x 0 is such a solution of these congruences, then the complete solution is given by all x x 0 (mod n 1 n 2 n k ). 13 Theorem 4.2 in Frank Zorzitto, A Taste of Number Theory. 14 Theorem 4.3 in Frank Zorzitto, A Taste of Number Theory. 39

41 Example Let us solve the system of congruences { x 3 (mod 6), x 7 (mod 13). Since 6 and 13 are coprime, by Bézout s lemma there exist integers x and y such that 6x + 13y = 1. Note that x = 2 and y = 1 give us an answer. We can multiply both sides of the above equality by 7 3 = 4 to obtain a solution to 6x + 13y = 7 3. Such a solution is given by x = 4 ( 2) = 8 and y = 1 4 = 4. After rearranging, we get 3 + 6x = 7 13y = 45. Note that 45 3 (mod 6) and 45 7 (mod 13). Since 6 and 13 are coprime, by the Chinese Remainder Theorem the congruence x (mod 78) captures all integer solutions to the original system of congruences. Exercise Solve the system of congruences x 3 (mod 5), x 5 (mod 7), x 7 (mod 11). 12 Polynomial Congruences The Chinese Remainder Theorem can also be utilized to solve polynomial congruences. Let d be a positive integer and consider a polynomial f (x) = c d x d + c d 1 x d c 1 x + c 0 with integer coefficients c 0,c 1,c 2,...,c d. Then the congruence of the form f (x) 0 (mod n) (4) is called a polynomial congruence. We would like to find all integers x, which satisfy such a congruence. Note that, if we replace the coefficients c i of f (x) with 40

42 their residue classes [c i ], thus reducing our polynomial from Z to Z n, solving the congruence (4) is equivalent to solving the equation f ([x]) = [0] in Z n. If such an equation is satisfied by some residue class [x 0 ], we say that [x 0 ] is a root of f (x) in Z n. Let n = p e 1 1 pe 2 2 pe k k be the prime factorization of n. Then, as it turns out, there is a one-to-one correspondence between solutions to the congruence (4) and solutions to the system of congruences f (x) 0 (mod p e 1 1 ); f (x) 0 (mod p e 2 2 );... f (x) 0 (mod p e k k ). This result follows from the next proposition, which is very similar to Proposition Proposition Let f (x) Z[x] be a polynomial. Let m and n be coprime moduli. Then f (x) 0 (mod mn) if and only if { f (x) 0 f (x) 0 (mod m); (mod n). Proof. Suppose that f (x) 0 (mod mn). Then mn f (x), which means that m f (x) and n f (x). Suppose that f (x) 0 (mod m) and f (x) 0 (mod n). Then m f (x) and n f (x). Since m and n are coprime, it follows from Proposition 3.12 that mn f (x). is the prime factor- Coming back to our previous notation, if n = p e 1 ization of n, and integers x 1,x 2,...,x k satisfy 1 pe 2 2 pe k k f (x i ) 0 (mod p e i i ) 41

43 for i = 1,2,...,k, then we can find x such that x x i (mod p e i i ) for all i using the Generalized Chinese Remainder Theorem. But then such an x would satisfy f (x) 0 (mod p e i i ) for all i, and therefore f (x) 0 (mod n). From here it follows that, if each congruence f (x) 0 (mod p e i i ) has s i solutions, then the congruence f (x) 0 (mod n) has s 1 s 2 s k solutions. Now we would like to determine how many solutions does a polynomial congruence f (x) 0 (mod p e ) have. Due to the time limitations, we will answer this question only in the case e = 1, and show that there are at most d solutions, where d is the degree of f (x). We remark that, in general, there are at most d solutions when p is an odd prime, and at most 2d solutions when p = 2. The most accurate estimates on the number of solutions of polynomial congruences was established in 1991 by the Canadian mathematician Cameron L. Stewart, who is currently a professor at the University of Waterloo. Proposition If p is prime and f (x) is a polynomial of degree d with coefficients in Z p, then f (x) has at most d roots in Z p. Proof. We will prove this result by induction on the degree d of a polynomial f (x). Base case. Let d = 0. Then f (x) = α 0 for some non-zero α 0 in Z p. Clearly, this polynomial has 0 d = 0 roots, so the result holds. Induction hypothesis. Suppose that the result is true for all polynomials of degrees k = 1,2,...,d 1. Induction step. We will show that the result holds for every polynomial of degree k = d. Let f (x) = α d x d + α d 1 x d α 1 x + α 0, where α d 0. If f (x) has no roots, then surely 0 n. Otherwise f (x) has a root, say β. Then f (x) = f (x) 0 = f (x) f (β) = α d (x d β d ) + α d 1 (x d 1 β d 1 ) α 1 (x β). Now recall that, for any positive integer j 2 it is the case that x j β j = (x β)(x j 1 + x j 2 β + x j 3 β xβ j 2 + β j 1 ). 15 Proposition 5.14 in Frank Zorzitto, A Taste of Number Theory. 42

44 Now we see that we can factor out (x β) in the expression for f (x) given above, which means that f (x) = (x β)g(x) for some polynomial g(x) with coefficients in Z p. Clearly, the degree of g(x) does not exceed d 1, so we can apply the inductive hypothesis to conclude that g(x) as at most d 1 roots. Let γ β be some root of f (x). Then 0 = f (γ) = (γ β)g(γ). We claim that g(γ) = 0. For assume otherwise, so that g(γ) 0 and γ β 0. But then both γ β and g(γ) are non-trivial zero divisors in Z p, and this contradicts Proposition 9.4, which asserts that there are no non-trivial zero divisors in Z p whenever p is prime. We conclude that g(γ) = 0. Since every root of f (x) is either equal to β or one of at most d 1 roots of g(x), we conclude that there are at most d roots of f (x). Example Let us solve the polynomial congruence x x (mod 119). Note that 119 = By Proposition 12.1, there is a one-to-one correspondence between the roots to the above congruence and the roots to the system of congruences { x x (mod 7); x x (mod 17). Let us solve each of these congruences separately. Consider the case n = 7 with ϕ(7) = 6. Note that x 0 (mod 7) is not a solution. This means that gcd(x,7) = 1, so we may apply Euler s Theorem: Thus we need to solve the congruence x x x x x + 2x x 3 + x + 3 (mod 7). 2x 3 + x (mod 7). 43

45 After evaluating the left hand side at x = 1,2,3,4,5,6, we can convince ourselves that there are only two solutions, namely x 2 (mod 7) and x 6 (mod 7). Consider the case n = 17 with ϕ(17) = 16. Note that x 0 (mod 17) is not a solution. This means that gcd(x,17) = 1, so we may apply Euler s Theorem: Thus we need to solve the congruence x x x x x + 2x x + 24 (mod 17). 3x (mod 17). We see that x 8 9 (mod 17) is a solution. Since 17 is prime, it follows from Proposition 12.2 that this is the only solution. Since there are two solutions modulo 7 and only one solution modulo 17, we conclude that there are 2 1 = 2 solutions modulo 7 17 = 119. These solutions correspond to two systems of equations: { x 2 (mod 7), x 9 (mod 17); and { x 6 (mod 7), x 9 (mod 17). We can compute solutions modulo 119 using the Extended Euclidean Algorithm. Consider the first system of congruences. Since 7 and 17 are coprime, by Bézout s lemma there exists a solution to 7x + 17y = 1. For example, x = 5 and y = 2. By multiplying both sides of the above equality by 9 2 = 7, we can find a solution to 7x + 17y = 9 2 = 7, namely x = 7 x = 35 and y = 7 ( 2) = 14. But then x 1 = 2 + 7x = 9 17y = 247 satisfies x 1 2 (mod 7) and x 1 9 (mod 17). Therefore x (mod 119) is a solution. The second system of congruences can be solved analogously and gives us a solution x (mod 119). Exercise Give examples of polynomials with coefficients in Z 8 and Z 15 for which the conclusion of Proposition 12.2 does not hold. 44

46 13 The Discrete Logarithm Problem. The Order of Elements in Z n Let n be a modulus. We already looked at certain kinds of equations in Z n. For example, in Section 6, we learned that neither [x] 2 +[y] 2 = 3 in Z 4 nor [x] 2 + [y] 2 + [z] 2 = 7 in Z 8 have solutions. In Section 8, we studied the equation [a][x] = [b] in Z n and saw that the usual application of the Extended Euclidean Algorithm allows us to produce all of its solutions. Now we want to understand how to handle exponential equations in Z n. In these kinds of equations, we are given residue classes [a] and [b] from Z n, and we want to determine all integer solutions x to the equation [a] x = [b]. This is essentially the same as solving the congruence a x b (mod n). The problem of finding solutions to these exponential equations is known as the discrete logarithm problem, or DLP. Example In Section 10, we already saw an example of an exponential equation in Z n, namely a x 1 (mod n). According to Euler s Theorem, this equation always has a non-zero solution whenever a and n are coprime. In particular, any x 0 (mod ϕ(n)) satisfies the above congruence, for if x = ϕ(n)k for some integer k, then a x a ϕ(n)k (a ϕ(n) ) k 1 k 1 (mod n). However, we do not know whether there are no other solutions to this equation. Depending on the choice of a, there might exist other solutions as well. In general, the discrete logarithm problem is hard to solve. This problem lies in the foundation of certain cryptosystems, which we will study in the future. Examples include the ElGamal encryption scheme and the Diffie-Hellman key exhchange. There are algorithms which allow to solve the discrete logarithm problem, such as Shanks s baby-step giant-step algorithm, or the number field sieve. None of these algorithms run in polynomial time. However, just like for the problem of integer factorization, there are quantum algorithms which allow to compute the discrete logarithm in polynomial time. In these notes, when solving the discrete logarithm problem, we will use brute force or apply Euler s Theorem. 45

47 In order to understand how solutions to a x b (mod n) look like, we need to understand certain fundamental properties of the group of units Z n. Definition If α Z n, the order of α is the smallest exponent k 1 such that α k = 1. The order is denoted by k = ord(α) or, if α = [a] for some integer a, by k = ord(a). From Euler s Theorem, it follows that for all α Z n it is the case that ord(α) ϕ(n). In fact, a much stronger result holds. Proposition Let α Z n. A positive integer m satisfies α m = 1 if and only if ord(α) m. Consequently, ord(α) ϕ(n). Proof. Let k = ord(α). We apply the Remainder Theorem and write m = kq + r, where 0 r < k. Then, since α k = 1, we obtain 1 = α m = α kq+r = (α k ) q α r = 1 q α r = α r. Since k is the smallest positive integer satisfying α k = 1, it must be the case that r = 0, so k m. For the converse, let m = kq. Then α m = α kq = (α k ) q = 1 q = 1. Finally, according to Euler s Theorem it is the case that α ϕ(n) = 1. But then it follows from what we proved above that ord(α) ϕ(n). Example Let us determine ord(α) in Z n for n = 17 and α = [3]. We have ϕ(n) = 16. Note that D = {1,2,4,8,16} is the complete list of positive divisors of ϕ(n). It follows from Proposition 13.3 that ord(α) D. Thus, in order to find the order of α, we just need to iterate over all elements in D. The smallest element d satisfying [3] d = [1] is the order. We have (mod 17), (mod 17), 3 4 (3 2 ) (mod 17), 3 8 (3 4 ) 2 ( 4) (mod 17), 3 16 (3 8 ) 2 ( 1) 2 1 (mod 17). 16 Propositon 5.5 in Frank Zorzitto, A Taste of Number Theory. 46

48 Thus we see that ord(α) = 16, which is the largest possible order that the element of Z 17 can attain. Note that there was no need for us to compute 316 modulo 17, because we know the result from Euler s Theorem. In contrast, consider the element β = [9] in Z 17. We have (3 2 ) (mod 17), which means that ord(β) 8. Convince yourself that, in fact, ord(β) = 8. Proposition 13.3 allows to classify all solutions to the exponential equation [a] x = [b]. Proposition Let [a],[b] be the elements of Z n. If x satisfies the equation [a] x = [b], then all solutions x to this equation satisfy x x (mod ord(a)). Proof. Let x be a solution to a x b (mod n) and let k = ord(a). By the Remainder Theorem, we can write x = kq + r, where 0 r < k. But then a x a kq+r (a k ) q a r 1 a r a r (mod n). Thus, without loss of generality, we may assume that 0 x < k. Now suppose that there exists some other x such that a x b (mod n). Once again, without loss of generality we may assume that 0 x x < k. But then implies a x b a x (mod n) a x x 1 (mod n). Since 0 x x < k, it must be the case that x = x, for otherwise we would get a contradiction to the fact that k is the smallest positive integer satisfying a k 1 (mod n). Therefore all solutions to [a] x = [b] are of the form x x (mod ord(a)). Example Let us compare the solutions to exponential equations 3 x 1 (mod 17) and 9 y 1 (mod 17). 47

49 In the first case, we see that the congruence x 0 (mod 16) captures all solutions. However, in the second case, even though y 0 (mod 16) does provide solutions, it clearly does not cover all of the possibilities because, for example, y = 8 also satisfies 9 y 1 (mod 17). In fact, Proposition 13.5 implies that the solutions are of the form y 0 (mod 8). We conclude this section with several general observations about orders of elements of Z n. Proposition If α Z n and k = ord(α), then the list does not repeat itself. α,α 2,α 3,...,α k = 1 Proof. Suppose that we have a repetition α i = α j, where 1 i < j k. Thus α j i = 1. Since 1 j i < k, this contradicts the minimality of k as the order of α. Proposition If α Z n and k = ord(α), then ord(α j ) = k gcd( j,k). Proof. Let ord(α j ) = l. We will show that l = k/gcd( j,k). Note that α jl = (α j ) l = 1. It follows from Proposition 13.3 that k jl. That is, jl = ku for some integer u. But then j gcd( j,k) l = k gcd( j,k) u, and since j/gcd( j,k) and k/gcd( j,k) are coprime, it follows from Proposition 3.13 that k/gcd( j,k) divides l. On the other hand, since k is the order of α, (α j ) k/gcd( j,k) = (α k ) j/gcd( j,k) = 1 j/gcd( j,k) = 1. By Proposition 13.3 applied to the order of α j, we obtain that l k/gcd( j,k). Since k/gcd( j,k) l and l k/gcd( j,k), we conclude that l = k/gcd( j,k). 17 Proposition 5.6 in Frank Zorzitto, A Taste of Number Theory. 18 Proposition 5.7 in Frank Zorzitto, A Taste of Number Theory. 48

50 Corollary Let α be an element of Z n. Then ord(α j ) = ord(α) if and only if gcd( j,ord(α)) = 1. Proposition Let α, β in Z n have orders k and l, respectively. If k and l are coprime then ord(αβ) = kl. Proof. Let m = ord(αβ). Since (αβ) kl = α kl β kl = (α k ) l (β l ) k = 1 l 1 k = 1, we see from Proposition 13.3 that m kl. We will now show that kl m. Since gcd(k,l) = 1, it follows from Proposition 3.12 that we only need to demonstrate k m and l m. On one hand, (α m ) k = α mk = (α k ) m = 1 m = 1 and (β m ) l = β ml = (β l ) m = 1 m = 1. On the other hand, (α m ) l = (α m ) l 1 = (α m ) l (β m ) l = (α m β m ) l = ((αβ) m ) l = 1 l = 1. It follows from above calculations, as well as from Proposition 13.3, that k ml. Since k and l are coprime, Proposition 3.13 allows us to conclude that k m. We can carry out an analogous calculation to show that (β m ) k = 1, which would imply l m. But then kl m, and since we already demonstrated that m kl, it must be the case that m = kl. 19 Proposition 5.9 in Frank Zorzitto, A Taste of Number Theory. 20 Proposition 5.16 in Frank Zorzitto, A Taste of Number Theory. 49

51 14 The Primitive Root Theorem Let n be a modulus. The elements α Z n whose order is equal to ϕ(n) deserve a special attention. According to Proposition 13.7, they allow to generate the whole group Z n simply by computing the exponents α,α 2,...,α ϕ(n) = 1. Such elements are called primitive roots and in this section we address the question of their existence in Z n. We will answer this question only partially by proving the Primitive Root Theorem. Definition An element α Z n is called a primitive root if ord(α) = ϕ(n). Example Let us demonstrate that Z 17 contains a primitive root. If we reduce the elements in the list {3,3 2,3 3,...,3 16 } modulo 17, then the resulting list is {3,9,10,13,5,15,11,16,14,8,7,4,12,2,6,1}. Note that all 16 elements are distinct and they constitute the whole Z 17. Not every element in Z 17 is a primitive root. For example, the observation made above does not hold for the list {9,9 2,9 3,...,9 16 } reduced modulo 17: {9,13,15,16,8,4,2,1,9,13,15,16,8,4,2,1}. The first 8 elements are distinct, and starting from the 9th element the pattern repeats. Hence 9,9 2,...,9 ϕ(n) = 1 do not produce Z 17, which is not a surprise, because from Example 13.4 we know that ord(9) = 8. There are groups which have no primitive roots at all. For example, there are no primitive roots in Z n whenever n has at least two distinct prime divisors. Examples include Z 6,Z 10 or Z 15, and we leave it as an exercise to the reader to verify that each of these three groups have no primitive roots. Before jumping into the proof of the Primitive Root Theorem, let us determine how many primitive roots are there in Z n. Proposition If Z n has a primitive root, then the total number of primitive roots in Z n is ϕ(ϕ(n)). Proof. Let α be a primitive root, so that ord(α) = ϕ(n) and α,α 2,...,α ϕ(n) = 1 21 Proposition 5.10 in Frank Zorzitto, A Taste of Number Theory. 50

52 cover all Z n without repetition. The other primitive roots are those powers α j in the list for which ord(α j ) = ϕ(n) = ord(α). According to Corollary 13.9, these are the powers α j where j from 1 to ϕ(n) is coprime to ϕ(n), and there are precisely ϕ(ϕ(n)) such j s. We are now ready to state the Primitive Root Theorem. Theorem (The Primitive Root Theorem) 22 Let p be prime. Then Z p contains a primitive element. If you are familiar with the basics of group theory, then you can translate the statement of the theorem into group theoretical language by saying that the group Z p is cyclic whenever p is prime. In order to prove this result, we need to prove one lemma. Lemma Let p be prime. If α is an element of Z p of order k, then α,α 2,...,α k 1,α k = 1 is the complete, non-repeating list of all β in Z p such that β k = 1. Proof. According to Proposition 13.7, the list α,α 2,...,α k contains no repetitions. Every α j in the list satisfies (α j ) k = (α k ) j = 1 j = 1. Hence every element in the list is a root of the polynomial x k 1. Since we found k distinct roots of the polynomial x k 1 whose degree is k, it follows from Proposition 12.2 that there are no other roots. Proof. (of Theorem 14.4) Let α be an element of Z p. If ord(α) = p 1, then α is a primitive root, so we are done. Thus we may assume that k = ord(α) < p 1. According to Lemma 14.5, the list α,α 2,...,α k = 1 picks up all roots of x k 1 in Z p. Since k < p 1, there is some γ in Z p, which is not on this list. Hence γ k Theorem 5.17 in Frank Zorzitto, A Taste of Number Theory. 23 Proposition 5.15 in Frank Zorzitto, A Taste of Number Theory. 51

53 Let l = ord(γ). Notice that l k, for otherwise we would have γ k = (γ l ) k/l = 1 k/l = 1. This means that in the unique factorizations of k and l, there is a prime number q that appears more often in l than it does in k. Therefore k = q d k 1 and l = q e l 1, where 0 d < e and q k 1, q l 1. Let β = α qd γ l 1. Then, according to Proposition 13.8, ord(α qd ) = ord(γ l 1 ) = k gcd(k,q d ) = k q d = k 1, l gcd(l,l 1 ) = l l 1 = q e. Since k 1 and q e are coprime, it follows from Proposition that ord(β) = ord(α qd γ l 1 ) = ord(α q d )ord(α l 1 ) = q e k 1 > q d k 1 = k = ord(α). In this way, new elements of strictly increasing order can be found in Z p, until we reach some element of the largest possible order ϕ(p) = p 1. By definition, this element is a primitive root. In conclusion, we provide a statement of the Generalized Primitive Root Theorem, which provides a full classification of moduli n such that Z n contains a primitive root. Due to the time limitations, we will refrain from proving this result. Theorem (Generalized Primitive Root Theorem) The group of units Z n contains a primitive root if and only if n = 2, 4, an odd prime power, or an odd prime power multiplied by two. 15 Big-O Notation Before we proceed to the discussion of primality tests and integer factorization algorithms, let us introduce several important definitions. When analyzing the performance of algorithms, we will often be using the big-o notation and the notion of a polynomial time (or subexponential time or exponential time) algorithm. 52

54 Definition Let f (n) and g(n) be two functions of n. We say that f (n) = O(g(n)) if there exists a positive real number M such that f (n) M g(n) for all sufficiently large n. Example Let f (n) = n 2 + 4n + 7 and g(n) = n 3. Note that 12 = f (1) > g(1) = 1, 19 = f (2) > g(2) = 8, 28 = f (3) > g(3) = 27, 39 = f (4) < g(4) = 64, 52 = f (5) < g(5) = 125,... so we see that, even though f (n) dominates g(n) for n = 1,2,3, the pattern changes for n = 4,5, and in fact it so happens that f (n) < g(n) for all n 4. Thus f (n) = O(g(n)). Note, however, that g(n) O( f (n)). Another example is f (n) = e n and g(n) = 5e n + e n/2. Evidently, f (n) g(n), so f (n) = O(g(n)). However, one may also notice that g(n) = O( f (n)), because e n/2 e n, and this implies that g(n) = 5e n + e n/2 5e n + e n = 6e n = 6 f (n), which means that g(n) = O( f (n)). In this case, we say that f (n) and g(n) have the same asymptotic behaviour as n approaches infinity. The big-o notation is used in order to simplify f (n) whenever we are interested not in its precise form, but rather in its behaviour for very large n. For example, a function f (n) = n 5 + 2e n + 3log(n) simplifies to f (n) = O(e n ), because 2e n dominates all other summands present above (note that 3log(n) < n 5 < 2e n for sufficiently large n). Also, according to our definition, we may ignore the constant 2 in front of 2e n, because it is present implicitly in the expression f (n) = O(e n ). Thus, when writing a certain expression in its big-o form, all that we need to do is to identify some simple function that dominates f (n), and we want to pick this function in the best way possible. Say, in the example above we could have written f (n) = O(e 2n ), but this is a less sharp estimate than f (n) = O(e n ), because e 2n grows much faster than e n. Thus the expression f (n) = O(e n ) tells us more information about the function f (n) than the expression f (n) = O(e 2n ). The most common types of functions that we will encounter are 53

55 O(1) at most constant growth; O(logn) at most logarithmic growth; O(n ( k ) ( at most polynomial growth (k > 0); O exp cn 1/k)) at most subexponential growth (c > 0,k > 1); O(exp(cn)) at most exponential growth (c > 0). When analyzing the performance of algorithms, the function f (n) will represent the number of steps required for the algorithm to terminate given the input n. For example, it was proved by Gabriel Lamé that the computation of gcd(a,b) with the Euclidean algorithm requires at most 5log 10 (min{a,b}) steps, and this allows us to conclude that the performance of the Euclidean algorithm is O(log(min{a,b})). So the number of steps required for the algorithm to terminate grows logarithmically as min{a,b} approaches infinity. Definition Suppose that an algorithm takes a positive integer n as its input. We say that an algorithm works in polynomial time if there exists a positive real number k such that the number of steps required for it to compute is O ( (logn) k). Once again, consider the Euclidean Algorithm. As the number of steps required to compute gcd(a,b) is equal to O(log(min{a,b})), we see that we may take k = 1 in order to conclude that the algorithm works in polynomial time. This may seem a bit strange, because (logn) k is not a polynomial function (compare it to, say, n 2 or n 3 + n + 7, which are polynomials). But when talking about an algorithm, we are interested in its performance not with respect to an input n, but rather with respect to the size of an input. You may think of the size of n as the number of decimal digits of n. This number never exceeds log 10 n + 1, so it is logarithmic in terms of n. So, if we provide n = as an input to some algorithm, roughly speaking we would consider it efficient if it terminates in 7 k steps for some positive integer k (note that 7 is the number of decimal digits of n) rather than in k steps. From this perspective, any algorithm which works in O(n) = O(e logn ) would actually be considered as an algorithm which works in exponential time. Such algorithms usually allow to compute values only for relatively small values of n. Example Here are some examples of famous algorithms and their asymptotic running time. The fastest algorithm for integer multiplication known to date is the Toom- Cook Multiplication Algorithm, which was invented in Given two 54

56 positive integers a and b, for d = log(max{a,b}) this algorithm requires O(d ) steps to compute, so it works in polynomial time; Shanks s Baby-Step Giant-Step Algorithm, which was invented in 1971, allows one to compute discrete logarithms modulo n. If d = logn, then the running time of the algorithm is O( n) = O(e d/2 ), so it works in exponential time; General number field sieve is the fastest algorithm known to date, which allows to factor large integers. If n is an integer and d = logn, the algorithm works in O(e 2d1/3 (logd) 2/3 ). The constant 2 in this expression is not optimal. We see that this algorithm is neither polynomial, nor exponential. These types of algorithms are called subexponential. 16 Primality Testing For more details, please refer to the monograph by R. Crandall, C. Pomerance, Prime Numbers: A Computational Perspective, As it was mentioned in the introduction, number theory is heavily used in cryptography. In the upcoming sections, we will look at several cryptographic protocols, all of which, in one way or the other, involve primality testing. For example, in order to ensure that the communication provided by the RSA cryptosystem is secure, one has to be able to generate a pair of very large prime numbers (several thousands of bits). But how do we ensure that some given number n is prime, when we know that the problem of factorization of large integers is infeasible to electronic computers? It turns out that there are several alternative ways to verify that n is prime, which do not require the factorization of n. There are three kinds of primality tests out there, namely 1. Heuristic tests tests that work well in practice, but reside on a heuristic explanation rather than on a proof (Fermat s Primality Test); 2. Probabilistic tests given n, these tests verify whether a number n is a pseudoprime, i.e., it is a prime with a very large probability (Miller-Rabin Primality Test); 3. Deterministic tests given n, these tests guarantee the primality or the compositeness of n (trial division, AKS Primality Test, Elliptic Curve Primality Test). 55

57 In this section, we will study the trial division method, the Fermat s Primality Test and the Miller-Rabin Primality Test. If time allows, by the end of the course we will see one of the fast deterministic primality tests, namely the Elliptic Curve Primality Test. We remark that the best known primality test, the AKS Primality Test, was invented by Indian mathematicians Manindra Agrawal, Neeraj Kayal and Nitin Saxena in To this day, it is the only deterministic unconditional polynomial time algorithm for primality testing. In 2005, its asymptotic running time got improved by C. Pomerance and H. W. Lenstra, Jr. to Õ((logn) 6 ). Despite all of its benefits, the probabilistic Miller-Rabin Primality Test is used in practice more often. If k denotes the number of times the algorithm has to run before we conclude that n is a pseudoprime, the asymptotic running time of the Miller-Rabin Primality Test is O(k(logn) 3 ) Trial Division What is the most obvious way for determining whether a given integer n 2 is composite? Well, one just has to find one of its non-trivial factors! That is, if we can show that there exists some integer d such that d n and 1 < d < n, then n is composite. For example, if n = 35, we just have to check that 2 35, 3 35, 4 35, until we find out that Therefore, 35 is a composite number. Of course, if we would consider n = 37, the problem arises, as now we have to check 2 37, 3 37,..., 36 37, until we find out that n is prime. Fortunately, as the following proposition suggests, there is no need to check all n 2 numbers in between 1 and n to be certain that n is prime. Proposition For any composite integer n 2 there exists a divisor d such that 1 < d n. Furthermore, we may assume that d is prime. Proof. Let n = dk for some non-trivial divisors d and k. If we now suppose that both d > n and k > n, then dk > n, a contradiction. Therefore either 1 < d n or 1 < k n hold. Without loss of generality, assume the former. Since Theorem 2.7 asserts the existence of a prime p dividing d and d n, we see that 1 < p d n. Now we may adjust our primality test as follows. Let x denote the largest integer x. According to Proposition 16.1, in order to verify that n is prime, we 56

58 just have to ensure that 2 n,3 n,..., n n. For example, in the case of n = 37, we have 37 = = 6, and 2 37, 3 37,..., Therefore 37 is prime. Thus we were able to reduce the number of steps in our primality test from n 2 to n 1. Quite a significant improvement! We can actually do slightly better. According to Proposition 16.1, we can limit ourselves only to prime divisors of n. So, in the case of n = 37, there was no need to check its divisibility by 4 or 6, since these numbers are composite. So we could achieve the same conclusion simply by testing 2 37, 3 37 and In order to make this further improvement, we need to know all prime numbers n. Fortunately, there is a rather simple method called the Sieve or Eratosthenes, which allows us to produce all prime numbers up to X in O(X loglogx) steps (see Assignment 3). The method was discovered by the Greek mathematician Eratosthenes of Cyrene ( 250BC), and goes as follows: 1. Initialize a table A of X elements by setting A[1] = 1 and A[i] = 0 for all 2 i X; 2. Let p = 2; 3. Set A[2p] = 1, A[4p] = 1, A[6p] = 1, and so on, for all multiples of p in the table A; 4. Change p to the smallest index k > p such that A[k] = 0. If p > X, terminate. Otherwise, return back to step 3. In the end, all elements i such that A[i] = 0 will correspond to prime numbers. It follows from Merten s Second Theorem that the asymptotic running time of the Sieve of Eratosthenes is O(X loglogx) (see Assignment 3). This can be further improved to O(X) if we start eliminating not from 2p (i.e. 2p, 4p, 6p, and so on), but from p 2, thus crossing out p 2,(p + 1)p,(p + 2)p, etc. The improvement becomes evident once we note that by the time the algorithm reaches prime p, the numbers 2p,3p,..., (p 1)p already got eliminated by some prime less than p. Of course, it is impractical to run the Sieve of Eratosthenes up to n each time we try to factor n, as then the asymptotic running time will always be O( n). This is why in practice one usually runs the Sieve of Eratosthenes up to some large bound first, then stores all of the prime number in the table, and later uses this table to factor integers. It follows from the Prime Number Theorem that the number of 57

59 primes X is O(X/logX). So, assuming that the table of prime numbers up to n is given to us a priori, the trial division will now take O( n/logn) steps instead of O( n). Note the power of this method: for example, given a number n 10 12, we just have to check p n for all primes p Given the table containing prime numbers less than a million, this verification can be done by the computer almost immediately. In fact this method should work quite fast for all numbers with at most 18 decimal digits. However, when the number of digits of n exceeds 18, things start to get more complicated: there are too many prime numbers to check, and it is difficult to fit all of them into memory at once Fermat s Primality Test Another interesting way of demonstrating that a number n is composite is to use the Fermat s Little Theorem, which states that, if n is prime and a is coprime to n, then a n a (mod n). Therefore all that we have to do to prove that n is composite is to find a such that a n a (mod n). If a satisfies such a property, we call it a witness for the nonprimality of n. In practice, the computation of a n (mod n) can be done relatively fast using the Double-and-Add Algorithm. Example Let us use Fermat s Primality Test to prove that n = 323 is not prime. Note that 323 = = Now pick a random a such that 1 < a < 323, say a = 5. If n is prime, then Fermat s Little Theorem should hold for a. We use the Double-and-Add Algorithm to check whether this is the case: , 5 32 (5 16 ) (5 2 ) 2 302, 5 64 (5 32 ) (5 4 ) 2 118, (5 64 ) (5 8 ) 2 35, (5 128 ) (mod 323). 58

60 Thus (mod 323). This result allows us to conclude that 323 is not prime. Note, however, that if we would randomly pick a = 18,152,170 or any other number for which a 323 a (mod 323) actually holds, we would not be able to draw any conclusion about n. Fortunately, for 323 there are only 7 possible a s between 1 and 323 such that a 323 a (mod 323), so the probability of this happening is relatively small. And even if this happens, we could just pick yet another random value of a, for which a 323 a (mod 323) might be true. From Example 16.2, the algorithm becomes clear. Let n be an integer, and let k 1 be the maximal number of times that we are going to choose a at random. Then do the following: 1. Set i = 0; 2. If i = k, conclude that n is a pseudoprime. Otherwise pick a random integer a such that 1 < a < n; 3. Compute a n (mod n) using the Double-and-Add Algorithm; 4. If a n a (mod n), conclude that n is composite. Otherwise increment i and go back to step 2. According to this algorithm, we conclude that n is a pseudoprime whenever k random choices of a result in a n a (mod n). In practice, this algorithm works quite well, even though it is purely heuristic. However, there are some special composite numbers which do not admit witnesses of their non-primality at all. Definition A composite integer n is called a Carmichael number whenever for all integers a. a n a (mod n) 59

61 There exist infinitely many Carmichael numbers, and the first 10 of them are 561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, They were discovered by the American mathematician Robert Carmichael. What is interesting is that the criterion for determining Carmichael numbers was found by the German mathematician Alwin Korselt in 1899, even before Carmichael numbers were discovered. Theorem An integer n is a Carmichael number if and only if 1. n = p 1 p 2 p k, where k > 1 and p j are primes without repetition; 2. every p j 1 divides n 1. Therefore every Carmichael number will always be regarded as a pseudoprime by the Fermat s Primality Test and this is unavoidable Miller-Rabin Primality Test This test was originally developed by Gary Miller in 1976 and it was deterministic, but its determinism relied on a reasonable but unproved conjecture, called the Extended Riemann Hypothesis. In 1980, Michael Rabin converted this algorithm into unconditional, but probabilistic algorithm. This is the algorithm that we are going to learn about. To understand the idea behind the Miller-Rabin primality test, recall that the congruence x 2 1 (mod p) has exactly two solutions, namely x ±1 (mod p), whenever p is prime. This simply follows from Proposition 12.2 applied to the quadratic polynomial x 2 1 with coefficients in Z p. Now let n > 2 be prime. Then n 1 = 2 s d for some positive integers s and d, where d is odd. According to Fermat s Little Theorem, ( ) a n 1 a 2sd 2 a 2s 1 d 1 (mod n). Thus we see that a 2s 1d is a root of x 2 1 modulo n. Since n is prime, a 2s 1d ±1 (mod n). If a 2s 1d 1 (mod n), we stop. Otherwise, we can extract the square 24 Theorem 5.21 in Frank Zorzitto, A Taste of Number Theory. 60

62 root one more time, so that a 2s 2d ±1 (mod n), and so on, until we either reach a 2rd 1 (mod n) for some r or a d 1 (mod n). We conclude that, if n is prime, then and either a d 1 (mod n); or a 2rd 1 (mod n) for some r such that 0 r s 1. Thus, if we could show that a d 1 (mod n) a 2rd 1 (mod n) for all r such that 0 r s 1, then n has to be composite. Note that with the Fermat s Primality Test we would only check for a 2sd 1 (mod n), whereas in the Miller-Rabin primality test we perform s checks for a d,a 2d,...,a 2s 1d (mod n). As it turns out, this is more than enough to fix many problems that we saw with Fermat s Primality Test. For example, Catalan numbers can be recognized as composite numbers. Furthermore, one can prove that at least 3/4 of a s coprime to an odd composite number n are witnesses of n s compositeness. Therefore, the probability that the Miller-Rabin Test would fail is at most 1/4, which means that after k verifications the probability that n is composite while it is reported as pseudoprime is at most 1/4 k. Unfortunately, one cannot do better than that, and predict the location of witnesses in Z/nZ. Their distribution can be very different, and this is why choosing a at random is better than to use a = 2,3,5,... iteratively. For example, Arnaut found a 397-digit composite number for which all bases a < 307 are not witnesses. This number was reported to be prime by the Maple isprime() function, because it picked prime bases a = 2,3,5,... iteratively, rather than randomly. Example Let us show that n = 323 is a pseudoprime using Miller-Rabin Primality Test and base a = 18. Note that a 323 a (mod n), so if we would use Fermat s Primality Test on n only once, it would report n as a pseudoprime. However, 322 = 2 161, and we note that ±1 (mod 323), so n = 323 would be reported as composite by the Miller-Rabin Primality Test. 61

63 17 Public Key Cryptosystems. The RSA Cryptosystem For more details, please refer to the monograph by W. Trappe, L. C. Washington, Introduction to Cryptography with Coding Theory, 2nd edition, Suppose that Alice wants to send a secret message to Bob, and because they are too far away from each other and personal communication is impossible, she needs to send this message over the internet. The channel between Alice s computer and Bob s computer is unprotected. While travelling from one computer to the other, the message passes many times through many different routers, and it is possible to intercept it by listening on the channel. For example, this can be done with packet analyzers like WireShark. Though interception of the message is hardly avoidable, it is possible to protect the information itself through encryption. Since the antiquity, the humanity was using what we now call private key cryptosystems. Perhaps, the most famous example of a private key encryption is the so-called Caesar cypher. According to Suetonius, Julius Caesar used this cypher in order to encrypt messages of military significance. The cypher shifts the message by 3 letters to the left: A X, B Y, C Z, D A,..., Y T, Z V (note that we used Latin alphabet instead of English alphabet). For example, the phrase DEVS EX MACHINA can be encrypted using Caesar s cypher as follows: ABRP BS IXZEFKX Now this cypher is not terribly sophisticated, but back in Caesar s time it was considered quite complex, and surely the receiver would have to know the magical number 3 in order to decrypt it by shifting letters three times to the right. So, as we can see, both the sender and the receiver, along with the encryption/decryption procedure, must agree on some private key, which in this case is equal to 3. Many ciphers, such as the Vigenère cipher, the renowned Enigma cipher, or modern ciphers such as the Digital Encryption Standard (DES) or Rijndael (AES), work according to this principle: once the sender and the receiver agree on some secret key, they both can encrypt and decrypt messages, thus being able to communicate securely. But what if the sender and the receiver are too far away from each other? If Alice is in Australia, Bob is in Bulgaria, then how can they agree on a secret key? One answer to this problem would be public key cryptography. Key insight: 62

64 Alice and Bob don t even have to agree on a private key in order to send encrypted messages to each other! The RSA cryptosystem was invented in 1977 by Ron Rivest, Adi Shamir and Leonard Adleman. It was the first practical widely deployed public key cryptosystem. This is how RSA works. Bob generates two really large distinct prime numbers p and q, computes n = pq, as well as ϕ(n) = (p 1)(q 1). Then he chooses an encryption exponent e such that and solves the congruence gcd(e,ϕ(n)) = 1, de 1 (mod ϕ(n)) for d. Then he sends the public key (n,e) to Alice. Alternatively, he can publish (n,e) on his webpage, thus making this key publicly available to everyone. However, he does not release the private key (p,q,d). No one knows the values of p, q and d except for Bob. Now Alice can use Bob s public key (n,e) to send messages to Bob securely. Suppose that Alice wants to send a message written in English. First, she converts this message into a number m. For example, this can be done using the ASCII table. According to the ASCII table, every upper or lower case letter of English alphabet, digit, and some special characters like * $! or %, correspond to some number between 0 and 127. For example, in the message Hello! the letter H corresponds to 72, letter e corresponds to 101, and so on: Character Base 10 Base 2 H e l o ! We concatenate base 2 representations of ASCII numbers corresponding to our characters together, thus obtaining a bigger number m: m = }{{} }{{} }{{} }{{} }{{} }{{} H e l l o! 63

65 Note that each character fits into 1 byte = 8 bits. Since there are 6 characters in our message, the resulting number m satisfies 0 m < = Now, if Bob will receive this number m, he can easily decode the message by reading off 8 bits at a time and matching them to a corresponding character in the ASCII table. Before encrypting the message, Alice needs to verify that 0 m < n so that the information will not get lost during the transmission. If it so happens that m n, she breaks the message into k = m/n +1 pieces m 1, m 2,..., m k such that 0 m i < n for all i, 1 i k, and then sends m 1, m 2,..., m k to Bob consecutively. Suppose that 0 m < n. Now Alice uses Bob s public key (n,e) and computes the integer c, 0 c < n, such that c m e (mod n). This number c is the result of RSA encryption, and Alice sends this encrypted message to Bob over the unprotected channel. When Bob receives the encrypted message c, he can decrypt it and obtain the original message m using the private key d: c d (m e ) d m de m (mod n). Note that above we utilized the fact that de 1 (mod ϕ(n)). Example Suppose that Bob chose p = 1597 and q = Then n = pq = = , ϕ(n) = (p 1)(q 1) = = Bob chooses the encryption exponent e = and then computes d e (mod ). Now he keeps p,q and d in secret, and makes (n,e) publicly available. Now, in order to send the message Hi! to Bob, Alice converts it into an integer m using the ASCII table: m = }{{} }{{} = }{{} H i! Alice verifies that 0 m < n, and then computes the encrypted message c with the Double-and-Add Algorithm using Bob s encryption exponent e: c m e (mod ). 64

66 Then Alice sends c = to Bob. When Bob receives c, he computes m with the Double-and-Add Algorithm using his private key d: m c d (mod ). After that, Bob converts the 3 byte number m into a three character message Hi! which Alice sent to him using the ASCII table. Now why this method of communication is secure? Suppose that some malicious adversary Eve managed to eavesdrop on the unprotected channel and intercept the message c. Since Bob s public key (n,e) is available to everyone, Eve also knows both n and e. Therefore Eve s goal is, by knowing (n,e) and c, to obtain m. The most obvious way to solve this problem is to find an integer d such that de 1 (mod ϕ(n)). In order to do so, Eve has to compute ϕ(n) = (p 1)(q 1) by knowing n. Unfortunately for Eve, the problem of computing ϕ(n) from n when n is a composite number is difficult, and requires a factorization of n. To this day, we do not know any polynomial time factorization algorithms. The best ones, namely the Quadratic Sieve and the Generalized Number Field Sieve, are subexponential. Thus, if we choose n large enough, and the National Institute of Standards and Technology (NIST) recommends to choose n > , the factorization of n would become infeasible to modern electronic computers, even if the work load would be distributed among several supercomputers. Of course, the numbers p, q and e should be chosen by Bob very carefully. For example, if either p or q are really small, then they can be located using trial division. If either p or q are really close to n = pq, say p n 2n 1/4, then the number n can be factored using the Fermat s Factorization Method. If the prime divisors of either p or q are really small, then the number n can be factored using Pollard s p 1 Algorithm (see Assignment 3). If e is chosen so that d is really small, say d < 3 1 n 1/4, then it can be calculated in polynomial time O(logn) (see Section in Trappe and Washington). When sending the message, Alice has to be really cautious as well. For example, if the number m is relatively small in comparison to n, then even without the knowledge of d or the factorization of n Eve can decrypt the message using the Short Plaintext Attack (see Section in Trappe and Washington). To solve this problem, Alice can pad her message with some random characters either at the beginning or at the end. So as you can see, there are many things that both Alice and Bob have to check before establishing a secure communication. 65

67 The RSA cryptosystem can be utilized not only for secure communication, but also for authentication purposes. Imagine a situation when Alice sends a message m to Bob, and Bob cares not so much about the privacy of their communication, but rather about the authenticity of the sender. That is, he wants to be absolutely sure that the message m was sent to him by Alice and no one else. The way this can be done using RSA is as follows: Alice puts a digital signature s on the message m using her private key d: s m d (mod n). Then she sends (m,s) to Bob. When Bob receives the message with Alice s signature, he can verify that it belongs to Alice by using her public key e and checking that m s e (mod n). Exercise Use your favourite computer algebra system to encrypt the message m = with RSA using the public key (n,e) = (786073,221891). Then break the system by factoring n = pq, determining the private key d, and then decrypting the message c = Exercise Use your favourite computer algebra system to verify that the message (m,s) = (100, ) belongs to the owner of the public key (n,e) = ( , ). Then break the system and put a fake digital signature s on the message m = , so that (m,s ) passes the verification with the public key (n,e). Exercise (Exercise 7 in Trappe and Washington) Naive Nelson uses RSA to receive a single ciphertext c, corresponding to the message m. His public modulus is n and his public encryption exponent is e. Since he feels guilty that his system was used only once, he agrees to decrypt any ciphertext that someone sends him, as long as it is not c, and return the answer to that person. Eve sends him the ciphertext 2 e c (mod n). Show how this allows Eve to find m. Exercise (Exercise 8 in Trappe and Washington) In order to increase security, Bob chooses n and two encryption exponents e 1, e 2. He asks Alice to encrypt her message m to him by first computing c 1 m e 1 (mod n), then encrypting c 1 to get c 2 c e 2 1 (mod n). Alice then sends c 2 to Bob. Does this double encryption increase security over single encryption? Why or why not? Exercise (Exercise 10 in Trappe and Washington) The exponents e = 1 and e = 2 should not be used in RSA. Why? 66

68 18 The Diffie-Hellman Key Exchange Protocol There are many benefits to using RSA, but there is one big problem: despite the fact that it works in polynomial time, it is quite slow. For suppose that we want to compute c m e (mod n). The Double-and-Add Algorithm requires at most log e squarings and at most log e multiplications, thus resulting in at most 2loge 2logn arithmetic operations in total. Each multiplication involves numbers of size at most logn. The best known multiplication algorithm, the Toom-Cook Algorithm, requires O((logn) ) steps to multiply two integers of size at most logn. Since there are at most 2logn multiplications, the encryption and decryption require O((logn) ) steps to compute. Roughly speaking, this means that if n is a 2048 bit number, then one can encrypt or decrypt messages in steps. Private key cryptosystems (also referred to as symmetric ciphers or block ciphers) are much much faster, because their execution does not involve any complex mathematical computations. Instead, in order to encrypt the message they use logical operations, such as AND, OR, NOT and XOR, as well as bit shifts and bit permutations. Caesar cipher is an example of a cipher which uses only shifts, but on letters of the alphabet rather than on bits. Anagrams, like ehll!o, are examples of permutations on letters. These operations are very simple and in fact require only O(1) steps to compute (compare it to multiplication, which requires O((logn) )). In the end, both encryption and decryption for these ciphers require O(log n) steps. The most widely deployed symmetric ciphers are 3-DES (Triple Data Encryption Standard) and AES (Advanced Encryption Standard), which is also commonly referred to as Rijndael. As it was mentioned in Section 17, in order to use private key cryptosystems two parties must agree on a secret key. So how can this be done when Alice and Bob are too far away from each other? Here is one way: Alice generates a secret key K, encrypts it using RSA with Bob s public key, and then sends the encrypted message to Bob. Bob decrypts the message, and so now Alice and Bob share a secret K in common. Then they may use whichever symmetric algorithm they want, such as 3-DES or AES. But there is another way for Alice and Bob to agree on a common key. This procedure, called The Diffie-Hellman Key Exchange Protocol, was patented by Whitfield Diffie and Martin Hellman in Its security is based on the Discrete Logarithm Problem, and it works as follows. Alice generates a large prime number 67

69 p, an integer g such that 0 g < p, and an integer x such that 1 x p 2. She computes g x (mod p), and then sends p, g and g x (mod p) to Bob. When Bob receives p, g and g x (mod p), he generates an integer y such that 1 y p 2, computes g y (mod p), and then sends it back to Alice. Finally, since Alice knows x and g y (mod p), she can compute g xy (g y ) x (mod p), and since Bob knows y and g x (mod p), he can compute g xy (g x ) y (mod p). So in the end both Alice and Bob share a secret in common, namely g xy (mod p). Why is this secure? If a malicious adversary Eve would listen on the communication between Alice and Bob, she could intercept p, g, g x (mod p) and g y (mod p), and by knowing this information she would have to compute g xy (mod p). This problem is called the Diffie-Hellman Problem, and it is at least as hard as the Discrete Logarithm Problem. That is, if Eve would know how to solve the Discrete Logarithm Problem, she would be able to solve the Diffie-Hellman Problem (see Assignment 3). However, it is not known whether these two problems are equivalent. We do not know any polynomial time algorithm for computing discrete logarithms. The best known subexponential algorithm is due to Adleman and it utilizes index calculus. The discrete logarithm can be computed quite fast in some special cases, but if the parameters p, g, x and y are chosen properly, the problem becomes intractable to modern electronic computers. There are many things that need to be verified in order to ensure that the communication is secure, but we will just mention that the parameter g should be chosen so that ord(g) in Z p is sufficiently large. As a final remark, we would like to mention that there exists an efficient quantum algorithm for computing discrete logarithms, which was invented by Peter Shor in Integer Factorization The next computational problem that we address is the integer factorization problem. That is, given a composite integer n, we would like to find a non-trivial divisor of n. Unlike for primality testing, we do not know any polynomial time algorithm for integer factorization. Many mathematicians believe that the integer 68

70 factorization problem is hard, and several cryptographic protocols, such as RSA, reside on this assumption. If you want to become a famous mathematician, try inventing a polynomial time algorithm for integer factorization. Note, however, that there exists an efficient quantum algorithm for integer factorization, which was invented by Peter Shor in There are many algorithms for integer factorization. The most obvious one, trial division, we studied in Section 16. Of course, this algorithm allows to factor an integer n in O( n) = O(e logn/2 ) steps, so this algorithm is exponential and is no good for factoring large integers. In this section, we will study two factorization algorithms, namely the Fermat s Algorithm and its optimized variant, called the Dixon s Algorithm. The former is an exponential algorithm and the latter is a subexponential algorithm. You will also learn about Euler s Factorization Method in Assignment Fermat s Factorization Method Fermat s Factorization Method was suggested by the French mathematician Pierre de Fermat back in XVII century. The idea is simple: given an integer n, the goal is to find integers x and y such that Then n = x 2 y 2. n = (x y)(x + y), and if neither x y nor x+y are equal to 1, this results in a non-trivial factorization of n. Note that even numbers cannot be represented in this form, but we may easily disregard them from consideration, since every even number greater than 2 always has a non-trivial divisor equal to 2. Unlike even integers, odd integers can be represented as a difference of two perfect squares, for if n = kl, then ( ) k + l 2 ( ) k l 2 n =. 2 2 Since n is odd, then so are k and l, which means that both (k + l)/2 and (k l)/2 are integers, too. If n = kl is a multiple of 4, such a representation is also possible once we assume that both k and l are even. From the formula above it is also evident that there can be many representations of an integer as a difference of two perfect squares. 69

71 Let x denote the smallest integer x. We will now convert the observations made above into an algorithm: 1. Put x := n and then set y := x 2 n; 2. If y is a perfect square, return ( x y ) ; otherwise proceed to Step 3; 3. Increase x by 1 and then set y := x 2 n; 4. Go back to Step 2. Note that the algorithm always terminates. Furthermore, if the algorithm returns 1, then the number n must be prime. Example Let us use Fermat s Algorithm to factorize n = Note that n 89.57, so we begin with x = 90 and y = x 2 n = = 77. We see that x y y =? no no yes Since 441 = 21, we see that 8023 = = (92 21)( ) = Thus Fermat s Factorization Algorithm terminated in just three steps, resulting in a non-trivial factor x y = = 73. Exercise Use Fermat s Algorithm to factor integers 4747 and Now let us analyze the performance of the algorithm above. We will count a single computation of x and y as one step. If n = kl and k is the largest divisor of n such that k n, then Fermat s Algorithm will return k as a result. In this case, y = (k + l)/2, which means that the number of steps required for the computation is equal to k + l 2 N. 70

72 We can bound this quantity from above as follows: k + l 2 N k + l 2 N = ( k l) 2 2 = ( n k) 2. 2k We see that, if n is prime, then k = 1 and the algorithm requires O(n) steps to compute. Therefore, in its worst case, the algorithm is exponential. Note that it is even worse than trial division, because the trial division requires O( n) steps to compute. Why do we care then about Fermat s Factorization Method? First of all, in some special cases it performs really well. For suppose that k satisfies n k 2n 1/4, so it is relatively close to n. Then for all n > 6 4 it is the case that ( n k) 2 2k 4 n 2( n 2n 1/4 ) 2 1 2n 1/4 < 3, which means that Fermat s Algorithm terminates in two steps! Of course, this is much faster than if we would use trial division. This is why Fermat s Factorization Method is usually used in combination with the Trial Division Method. First one chooses a constant c > n and then Fermat s Algorithm is used to look for divisors between n and c. After that, one only has to check prime divisors of n with the trial division method up to c c 2 n instead of n. Even though this observation does not allow to push the bound below O(n 1/2 ), it does allow to decrease the constant implicit in the big-o notation significantly. Further improvements can be done through sieving, and in 1974 Lehman managed to combine all of the improvements and invented a factorization algorithm based on Fermat s Factorization Method and trial division with asymptotic running time O(n 1/3 ). Though Fermat s Algorithm can be quite slow in its worst case, it lies in the foundation of the best factorization algorithms known to date, namely the 71

73 quadratic sieve and the generalized number field sieve, which have subexponential asymptotic running time. Both of these algorithms evolved from the factorization method due to Dixon Dixon s Factorization Method Dixon s Factorization Method was proposed in 1971 by the Canadian mathematician John D. Dixon, who is a professor emeritus at Carleton University, Ottawa. Recall that in Fermat s Factorization Method we were choosing an integer x between 0 and n and then evaluating x 2 (mod n), hoping that the result would be a perfect square; that is, x 2 y 2 (mod n). Unfortunately, up to n, there are only n perfect squares, and so for very large n the total proportion of perfect squares less than n tends to zero: n n n n = 1 0. n Dixon s method suggests that, instead of looking for a perfect square we can actually construct it from many random samples. The idea is as follows: by picking distinct x 1,x 2,... between 0 and n at random, we obtain relations of the form x1 2 z 1 x2 2 z 2... (mod n), (mod n), where z 1,z 2,... are integers between 0 and n. One would then hope to select relations i 1,i 2,...,i r so that the number z i1 z i2 z ir = y 2 is a perfect square. But then (x i1 x i2 x ir ) 2 y 2 (mod n), which means that one can compute a divisor d of n by evaluating d = gcd(x i1 x i2 x ir y,n). If it so happens that d = 1 or d = n, we construct a new set of random samples, or select a different k-tuple i 1,i 2,...,i r with the property described above. Now the main question is, how do we construct congruences x 2 i z i (mod n), from which we can produce a non-trivial perfect square? The main idea here is to pick only those x i s, for which the resulting values of z i s are so-called B-smooth numbers. 72

74 Definition Let B 2 be a real number. An integer n is called B-smooth if for any prime p n it is the case that p B. Example For example, numbers 2,3,4,5,6,8,9,10,12 are all 5-smooth. The reason is that every prime p dividing an integer from that list satisfies p 5. The numbers 7 and 11, however, are not 5-smooth, but they are both 11-smooth. Now every time we choose a random x and then evaluate z x 2 (mod n) such that 0 z < n, we need to verify that z is B-smooth. One can check that a given number z is B-smooth in just O(B) steps using trial division. Note that, if p 1 < p 2 <... < p k are all prime numbers B, then every B-smooth number can be written in the form z = p e 1 1 pe 2 2 pe k k, where e 1,e 2,...,e k are non-negative integers. Thus we obtain a vector v = (e 1,e 2,...,e k ) in Z k. Further, we can reduce the elements of this vector modulo 2, thus obtaining a vector ṽ = (ẽ 1,ẽ 2,...,ẽ k ) in Z k 2 with ẽ 1,ẽ 2,...,ẽ k {0,1}. Because Z 2 forms a field (that is, division by a non-zero element is always allowed), the set Z k 2 constitutes a k-dimensional vector space over Z 2, which means that we can analyze it from the perspective of linear algebra. In particular, any collection of k + 1 vectors in Z k 2 will always be linearly dependent. Now suppose that for distinct values x 1,x 2,...,x k+1 we managed to compute B-smooth values z 1,z 2,...,z k+1, which correspond to vectors v 1, v 2,..., v k+1 in Z k 2. Since Zk 2 has dimension k, it must be the case that vectors v 1, v 2,..., v k+1 are linearly dependent in Z k 2. But then there must exist indices i 1,i 2,...,i r for some r k + 1 such that v i1 + v i v ir 0 (mod 2), which means that z i1 z i2 z ir is a perfect square. In order to find such linearly dependent vectors v i1, v i2,..., v ir in Z k 2, we row reduce the (k+1) (k+1) matrix M = [ v 1, v 2,..., v k+1 ] T, whose coefficients belong to Z 2. Note that the row reduction requires O(k 3 ) = O(B 3 ) steps. At this point, we can compute the value d = gcd(x i1 x i2 x ir z i1 z i2 z ir,n) and, in case if d = 1 or d = n, repeat the procedure of choosing distinct random values x 1,x 2,...,x k+1 once again. 73

75 The only thing that is left for us to establish is the value of B. As it turns out, the most optimal choice for B is B = e O( lognloglogn), so the asymptotic running time of Dixon s algorithm is subexponential. Exercise In this exercise, we will use Dixon s method to find a non-trivial factor of (a) Factorize integers 15, 486, to ensure that they are all 7-smooth; (b) Suppose that the execution of Dixon s Factorization Algorithm allowed us to locate the congruences (mod 34081); (mod 34081); (mod 34081). Using the above congruences, as well as the factorizations obtained in Part (a), find integers x and y such that x 2 y 2 (mod 34081), and then use these x and y to compute a non-trivial factor of Quadratic Residues Let n 3 be a modulus and a,b,c be arbitrary integers. We will now turn our attention to quadratic congruences ax 2 + bx + c 0 (mod n). We require that n a, for otherwise the above congruence would reduce to the linear congruence bx + c 0 (mod n). Also, if n = 2, by Fermat s Little Theorem x 2 x (mod 2) regardless of x. Thus ax 2 + bx + c (a + b)x + c (mod 2), so once again we obtain a linear congruence. Thus it is reasonable to assume that n 3. Finally, for the simplicity of exposition, we will assume that n is an odd prime, and we will indicate that by writing p instead of n. Note that the integer p 1 2 is even. In this section, we will not aim to solve quadratic congruences. Instead, we will investigate when solutions exist. Note that it follows from Propositon 12.2 that the polynomial [a][x] 2 + [b][x] + [c] has at most 2 roots in Z p. 74

76 Proposition Let p be an odd prime, and a, b, c be integers where p a. The quadratic congruence ax 2 + bx + c 0 (mod n) has a solution x if and only if the congruence y 2 b 2 4ac (mod p) has a solution y. In that case, y 2ax + b (mod p). Proof. Multiply both sides of the quadratic congruence by 4a to get 4a 2 x 2 + 4abx + 4ac 0 (mod p). This can be rewritten as (2ax + b) 2 b 2 + 4ac 0 (mod p), which is the same as (2ax + b) 2 b 2 4ac (mod p). Conversely, suppose that y is a solution to y 2 b 2 4ac (mod p). Note that we can solve the linear congruence 2ax + b y (mod p) for x, because [2a] is a unit in Z p. Thus (2ax + b) 2 y 2 b 2 4ac (mod p), which is the same as 4a 2 x 2 + 4abx + 4ac 0 (mod p). Since [4a] is a unit in Z p, we can multiply both sides of the above congruence by (4a) 1 (mod p) in order to obtain ax 2 + bx + c 0 (mod p). Therefore x which satisfies 2ax + b y (mod p) is a solution to the original quadratic congruence. 25 Proposition 6.1 in Frank Zorzitto, A Taste of Number Theory. 75

77 Proposition 20.1 tells us that solving the quadratic congruence ax 2 + bx + c 0 (mod p) is equivalent to solving a simplified quadratic congruence x 2 d (mod p), where d = b 2 4ac. The integer d is called the discriminant of the quadratic polynomial ax 2 + bx + c. Thus, in order to find solutions to x 2 d (mod p), we need to understand which residue classes of Z p are squares. Definition A residue α in Z p is called a quadratic residue when α Z p and α = β 2 for some other residue β in Z p. If such β does not exist, then α is called a quadratic nonresidue. When translated to the language of congruences, we say that an integer a has a quadratic residue modulo an odd prime p if p a and a x 2 (mod p) for some integer x. Example Let us find all quadratic residues in Z 13. We note that [1] 2 = [1] [7] 2 = [10] [2] 2 = [4] [8] 2 = [12] [3] 2 = [9] [9] 2 = [3] [4] 2 = [3] [10] 2 = [9] [5] 2 = [12] [11] 2 = [4] [6] 2 = [10] [12] 2 = [1] Thus the quadratic residues are [1],[3],[4],[9],[10],[12]. Exercise Determine all quadratic residues in Z 17, Z 19 and Z 23. Proposition Let p be an odd prime. Then the group of units Z p has exactly (p 1)/2 quadratic residues and exactly (p 1)/2 quadratic nonresidues. Proof. Note that, for any [a] in Z p, it is the case that [a] 2 = ( [a]) 2. Thus it is sufficient to look at a s such that 1 a (p 1)/2. We now claim that all the elements in the collection [ ] p 1 2 [1] 2,[2] 2,..., 2 76

78 are distinct. Suppose not, and [a] 2 = [b] 2 = [c] for some residue [c]. Then both [a] and [b] are the roots of the polynomial X 2 [c] in Z p. By Proposition 12.2, such a polynomial has at most 2 roots in Z p. However, we see that it has at least 4 distinct roots, namely ±[a] and ±[b]. Thus we obtain a contradiction. Therefore the above collection has no repetitions, so Z p contains (p 1)/2 residues. Since every element of Z p which is not a residue is a nonresidue, we conclude that there are exactly (p 1)/2 nonresidues. Definition For an odd prime p and an integer a coprime with p, we let ( ) { a +1 if a has a quadratic residue modulo p; := p 1 if a does not have a quadratic residue modulo p. The symbol ( a p ) is called the Legendre symbol for a modulo p. Example Note that ( ) ( ) 8 6 = +1 while = Also, for any odd prime p it is clear that 1 is a quadratic residue, i.e. ( ) 1 p = +1. However, the value of ( ) 1 p varies with p. For example, ( 1 13 ) = +1 while ( 1 19 ) = 1. We will now give an alternative proof of Proposition 20.5 using primitive roots. Proof. (of Proposition 20.5) Since p is an odd prime, it follows from the Primitive Root Theorem that there exists a primitive root γ in Z p. That is, for every residue α in Z p there exists an integer j, 1 j p 1, such that α = γ j. First of all, let us demonstrate that it is impossible to represent α by both odd and even powers of γ. For suppose that α = γ i = γ j for some 1 i j. Then γ j i = 1. By Proposition 13.3, ord(γ) j i. Since ord(γ) = p 1, we conclude that an even number p 1 divides j i. But then it means that either both i and j are odd or both i and j are even. Now recall that, since γ is a primtive root in Z p, the elements γ,γ 2,...,γ p 1 are distinct, and half of them are even powers of γ. These are the quadratic residues. On the other hand, all odd powers of γ are quadratic nonresidues. 77

79 Proposition Let p be an odd prime and let α and β be the elements of Z p. Then If α and β are quadratic residues then αβ is a quadratic residue; If α is a quadratic residue and β is a quadratic nonresidue then αβ is a quadratic nonresidue; If α and β are quadratic nonresidues then αβ is a quadratic residue. Proof. Since p is an odd prime, it follows from the Primitive Root Theorem that there exists a primitive root γ in Z p. Then α = γ i and β = γ j, so αβ = γ i+ j. Now, as we saw in the second proof of Proposition 20.5, if α and β are quadratic residues then both i and j are even, which means that i + j is even as well. Therefore αβ = (γ (i+ j)/2 ) 2 is a quadratic residue. We can prove the other two statements analogously. The propositions above suggest one algorithm for calculating the Legendre symbol ( a p ). First, we need to find the primitive root γ in Z p and then determine the parity of x in γ x = [a]. Fortunately, Euler came up with a much simpler procedure. Proposition (Euler s Test) 26 If p is an odd prime and a is an integer such that p a, then ( ) a p 1 a 2 (mod p). p In other words, if a has a quadratic residue, then a p (mod p), and if a does not have a quadratic residue, then a p (mod p). Proof. Let [b] be a primitive root in Z p. Suppose that a is a quadratic residue. Then a b 2 j (mod p) for some non-negative integer j. Thus Thus ( a p ) = +1, as claimed. a p 1 2 ( p 1 2 b j) 2 b (p 1) j (b j ) p 1 1 (mod p). 26 Proposition 6.8 in Frank Zorzitto, A Taste of Number Theory. 78

80 Now suppose that a is a quadratic nonresidue. Then for some non-negative integer j. Then a b 2 j+1 (mod n) a p 1 2 ( p 1 2 b j+1) 2 b p 1 2 b (p 1) j b p 1 2 (mod p). Note that ( ) b p b p 1 1 (mod p), ] so the residue class [b p 1 2 is a root of the polynomial X 2 1 in Z p. Since p is an odd prime, by Proposition 12.2, this polynomial has at most two roots. In fact, it has exactly two roots, namely X = ±[1]. Therefore b p 1 2 ±1 (mod p). Note that it cannot happen that b p (mod p), because then the order of [b] would be strictly less than p 1 = ϕ(p), which contradicts the fact that [b] is a primitive root in Z p. Therefore b p (mod p), and so we conclude that, when a is a quadratic nonresidue, a p (mod p). Therefore for any a such that p a it is the case that a p 1 2 ( a p ) (mod p). Corollary The integer 1 is a quadratic residue modulo an odd prime p if and only if p 1 (mod 4). Proof. By Euler s Test, ( ) 1 ( 1) p 1 2 (mod p). p 27 Proposition 6.10 in Frank Zorzitto, A Taste of Number Theory. 79

81 Since both sides of the above congruence are equal to ±1, this congruence is actually an equality. The result then follows from the fact that { ( 1) p 1 1 p 1 (mod 4); 2 = 1 p 3 (mod 4). Example Does a = 138 have a quadratic residue modulo p = 557? We use Euler s Test to answer this question. Note that p 1 2 = 278. We can now compute a p 1 2 (mod p) using the Double-and-Add algorithm: a p (mod 557). Therefore 138 does not have a quadratic residue modulo 557. Exercise Compute ( ) ( , 364 ) ( 503 and ) using Euler s Test. At the end of this section, let us take a look at one curious application of the theory of quadratic residuocity. Proposition There are infinitely many primes congruent to 1 modulo 4. Proof. Suppose we have a finite list of primes p 1, p 2,..., p n congruent to 1 modulo 4. We will show how to produce yet another prime congruent to 1 modulo 4 that is not on this list. Let x = (2 p 1 p 2 p n ) Let q be any prime factor of x. If q {2, p 1, p 2,..., p n }, then q 1, which is impossible. Since q divides x, we see that 1 (2 p 1 p 2 p n ) 2 (mod q), which means that 1 is a quadratic residue modulo q. But then it follows from Corollary that q 1 (mod 4). Thus we were able to produce on more prime which is not in the original list of primes. Repeating this procedure yet another time but with the list p 1, p 2,..., p n, p n+1 = q, we can produce one more prime congruent to 1 modulo 4, and so on. Hence we can generate infinitely many distinct primes that are congruent to 1 modulo Proposition 6.11 in Frank Zorzitto, A Taste of Number Theory. 80

82 21 The Law of Quadratic Reciprocity Let p 3 be prime and a be an integer such that p a. We have already seen several approaches for computing ( a p ), for example the Euler s Test. In this section, we will investigate one more approach invented by Gauss. In fact, he established what we now call the Law of Quadratic Reciprocity, which encapsulates very important properties of quadratic residues. We begin by proving the following proposition on the multiplicativity of the Legendre symbol. Proposition The Legendre symbol is multiplicative. That is, if p is an odd prime and a, b are integers coprime to p, tehn ( ) ( )( ) ab a b =. p p p Furthermore, if a b (mod p), then ( ) a = p ( ) b p Proof. The second statement is obvious because the residue is the same for all congruent integers. To prove that ( ) ( ab p = ap )( bp ) for any a and b coprime to p, we apply Euler s Test (see Proposition 20.9): ( )( ) a b p p = a p 1 2 b p 1 2 (ab) p 1 2 ( ) ab p (mod p). Since ( a p )( bp ) = ±1 and ( abp ) = ±1 and these two integers are congruent modulo p, they have to be identical. By the Fundamental Theorem of Arithmetic, every positive integer a > 1 is a product of primes. That is, a = q 1 q 2 q n for some primes q 1,q 2,...,q n with repetitions allowed. By Proposition 21.1, ( ) a p = ( q1 p )( q2 p ) ( ) qn. p 29 Propositon 6.15 in Frank Zorzitto, A Taste of Number Theory. 81

83 Also, if a is a negative integer, then a = 1 b for some positive integer b, which means that ( ) ( )( ) a 1 b =. p p p We conclude that, in order to determine the value of ( ) a p, one has to explore the values of ( q p) for distinct primes p and q. Essentially, for any fixed prime q, the Law of Quadratic Reciprocity allows us to understand what values does the Legendre symbol ( q p) take when an odd prime p varies. As a very simple example, let us explore the case q = 2. Proposition If p is an odd prime then ( ) { 2 +1 p 1,7 (mod 8); = p 1 p 3,5 (mod 8). Proof. Suppose p = 8k + 1 for some for some positive integer k. There are 4k = p 1 2 even integers between 1 and p, namely Let us compute their product: However, 2,4,6,...,4k 2,4k,4k + 2,4k + 4,...,8k 2,8k. x = (4k 2) (4k) (4k + 2) (4k + 4) (8k 2) (8k) = 2 4k (1 2 3 (2k) (2k + 1) (2k + 2) (4k 1) (4k)) = 2 4k (4k)! 4k k (mod p) 4k k (mod p). 8k 2 2 (mod p) 8k 1 (mod p). Using the above information, we can compute x (mod p) as follows: x (4k 2) (4k) (1 4k) (3 4k) (5 4k) ( 2) ( 1) (4k 1) (4k 3) (4k 5) 3 1 ( 1) 2k (4k)! (mod p). 30 Proposition 6.14 in Frank Zorzitto, A Taste of Number Theory. 82

84 We conclude that 2 4k (4k)! (4k)! (mod p). After cancelling (4k)! on both sides we obtain 2 p k 1 (mod p). By Euler s Test, the integer 2 has a quadratic residue modulo p. The cases p 3,5,7 (mod 8) can be studied analogously and are left as an exercise to the reader. Since we managed to understand how ( q p) behaves for fixed q = 2, one would hope that such a result can be established for all other primes. Indeed, this can be achieved with the Law of Quadratic Reciprocity, proved by the German mathematician Carl Friedrich Gauss at the age of 19. Theorem (Gauss s Law of Quadratic Reciprocity) 31 Let p and q be distinct odd prime numbers. Then (p )( ) q = ( 1) p 1 2 q 1 2. q p In other words, ( ) {( q p = p) q ( q) p if p 1 (mod 4) or q 1 (mod 4); if p 3 (mod 4) and q 3 (mod 4). The proof is quite non-trivial, so due to time limitations we will not present it in class or in these notes. If you would like to see the proof, see Section 6.4 in Frank Zorzitto, A Taste of Number Theory. Example Let us examine how the value of ( ) 3 p depends on the odd prime p. By the Law of Quadratic Reciprocity, ( )( ) 3 p = ( 1) p = ( 1) p 1 2. p 3 Multiplying both sides of the above equality by ( p 3), we obtain ( ) ( ) 3 = ( 1) p 1 p 2. p 3 Now there are two cases to consider: 31 Theorem 6.16 in Frank Zorzitto, A Taste of Number Theory. 83

85 1. Suppose that p 1 (mod 4). Then ( 3 p ) = ( p 3), so the value of ( 3p ) depends on the congruence class of p modulo 3. Note that ( 1 p ) = +1 and ( 2p ) = 1. We conclude that ( 3 p ) = +1 if { p 1 (mod 4); p 1 (mod 3), and ( 3 p ) = 1 if { p 1 (mod 4); p 2 (mod 3). Since 3 and 4 are coprime, we can apply the Chinese Remainder Theorem to conclude that ( 3 p ) = +1 when p 1 (mod 12) and ( 3p ) = 1 when p 5 (mod 12). 2. Analogously, we can analyze the case p 3 (mod 4). We have ( 3 p ) = ( p 3), which means that ( 3 p ) = +1 if { p 3 (mod 4); p 2 (mod 3), and ( 3 p ) = 1 if { p 3 (mod 4); p 1 (mod 3). Applying the Chinese Remainder Theorem, we see that ( 3 p ) = +1 when p 11 (mod 12) and ( 3 p ) = 1 when p 7 (mod 12). We conclude that ( ) { 3 +1 p 1,11 (mod 12); = p 1 p 5,7 (mod 12). Exercise Determine for which odd primes p the Legendre symbols ( ±5 p ) and ( ±7 p ) are equal to +1 or 1. 84

86 Exercise Let us determine the value of ( ). Note that 209 = 13 19, 13 1 (mod 4) and 19,479 3 (mod 4). Then we may use the multiplicativity of the Legendre symbol and the Law of Quadratic Reciprocity as follows: ( ) ( )( ) = ( ) ( ( )) = ( )( ) 11 4 = ( )( ) = ( ) 13 = 11 ( ) 2 = 11 = 1. Note that the last equality holds because the only quadratic residues in Z 11 are [1],[3],[4],[5] and [9]. Since [2] is not in this list, it is a quadratic nonresidue. 22 Multiplicative Functions The last 16 sections were all devoted to the theory of congruences, and at this point it is time to switch gears and move towards other topics. This section, we begin our first exposition to the Analytic Number Theory. In analytic number theory, we utilize the tools of real or complex analysis in order to answer some questions in number theory. For example, the techniques of analytic number theory allow to explain the asymptotic behaviour of functions or π(x) = #{p x: p is prime} Q(x) = #{n x: n is squarefree}. Here #X denotes the cardinality of the set X. The study of analytic number theory begins with the introduction of multiplicative and totally multiplicative functions. 85

87 Definition A non-zero function f : N C is called multiplicative if for any coprime positive integers m and n it is the case that f (mn) = f (m) f (n). Definition A non-zero function f : N C is called totally multiplicative if for any positive integers m and n, not necessarily coprime, it is the case that f (mn) = f (m) f (n). Example Here are some examples of multiplicative and totally multiplicative functions: 1. The indicator function I(n) is totally multiplicative: { 1, if n = 1; I(n) = 0, if n 1; 2. The constant function 1(n) is totally multiplicative: 1(n) = 1 for all n. 3. The identity function i(n) is totally multiplicative: i(n) = n for all n. 4. The Legendre symbol ( n p ) for a fixed odd prime p is totally multiplicative in accordance with Proposition 21.1; 5. The Euler totient function ϕ(n) is multiplicative, but not totally multiplicative; 6. The number of divisors function τ(n) is multiplicative, but not totally multiplicative: τ(n) = #{d : d n,d > 0}; 7. The sum of divisors function σ(n) is multiplicative, but not totally multiplicative: σ(n) = d; d n d>0 86

88 8. The Möbius function is multiplicative, but not totally multiplicative (you will prove this fact in Assignment 5): 1, if n = 1; µ(n) = 0, if n is not squarefree; ( 1) k, if n is squarefree with k distinct prime factors. We will now explore some properties of multiplicative functions. Proposition If m and n are coprime positive integers, then every positive divisor d of their product mn comes from a unique pair of integers a and b such that a m, b n and ab = d. Proof. If the unique factorizations of m and n are given by m = p e 1 1 pe 2 2 pe k k and n = q f 1 1 q f 2 2 qf l l, then the unique factorization of mn takes the form d = p r 1 1 pr 2 2 pr k k qs 1 1 qs 2 2 qs l l, where 0 r i e i and 0 s j f j. If we now set a = p r 1 1 pr 2 2 pr k k and b = qs 1 1 qs 2 2 qs l l, it becomes obvious that a m, b n and ab = d. Now we need to confirm that the above a and b are unique. Suppose that there exist positive integers c and e such that c m, e n and ec = d. Then ce = ab. Since c m and b n, it must be the case that c and b are coprime. Therefore c a. By a symmetric argument, a c, whence a = c, and then b = e. Proposition Let f : N C be a multiplicative function. Then 1. f (1) = 1; 2. The function f (n) is fully determined by its values at prime powers; 32 Proposition 8.2 in Frank Zorzitto, A Taste of Number Theory. 87

89 3. The function g(n) given by g(n) := f (d) d n d>0 is multiplicative. Proof. Property 1 is obvious, because f (n) = f (1 n) = f (1) f (n). By definition, f (n) is non-zero, so there exists some n such that f (n) 0. For such n, we may cancel f (n) on both sides of the above equality, thus leaving f (1) = 1. To establish property 2, let n = p e 1 1 pe 2 2 pe k k be the prime factorization of n. Then f (n) = f (p e 1 1 ) f (pe 2 2 pe k k ) since gcd(pe 1 1, pe 2 2 pe k k ) = 1; = f (p e 1 1 ) f (pe 2 2 ) f (pe 3 3 pe k k ) since gcd(pe 2 2, pe 3 3 pe k k ) = 1; = f (p e 1 1 ) f (pe 2 2 ) f (pe k k ). Thus if we know the values of f (p e ) for all prime powers p e, we know the values of f (n) for all positive integers n. To establish property 3, we use Proposition 22.4: g(mn) = f (d) d mn = f (ab) by Proposition 22.4; a m,b n = f (a) f (b) since gcd(a, b) = 1 and f is multiplicative; = a m,b n ( f (a) a m = g(m)g(n). )( ) f (b) b n 88

90 Proposition The Euler totient function ϕ(n) is multiplicative. Furthermore, if n = p e 1 1 pe 2 2 pe k k is the prime factorization of n, then ϕ(n) = (p e 1 1 pe )(p e 2 2 pe ) (p e k k pe k 1 k ). Proof. For an integer x, let us use the notation [x] n to indicate the residue class of x modulo n. Let m and n be coprime integers exceeding 1. We will show that Z mn is in one-to-one correspondence with the Cartesian product Z m Z n = {(α,β): α Z m,β Z n}. Let [x] mn Z mn. Then gcd(x,mn) = 1, which means that gcd(x,m) = 1 and gcd(x,n) = 1. But then [x] m and [x] n must be units in Z m and Z n respectively, so [x] m Z m and [x] n Z n. Conversely, if [a] m Z m and [b] n Z n, then by the Chinese Remainder Theorem there exists some [x] mn Z mn such that [x] m = [a] m Z m and [x] n = [b] n Z n. Therefore x is coprime to both m and n, and so x is coprime to mn. Thus we conclude that [x] mn Z mn. Now that we saw that there exists a one-to-one correspondence between Z mn and Z m Z n, we can conclude that #Z mn = #(Z m Z n). But since the cardinality of the Cartesian product is equal to the cardinality of the individual sets, i.e. #(Z m Z n) = #Z m #Z n, with the help of Exercise 10.2 we can conclude that ϕ(mn) = #Z mn = #(Z m Z n) = #Z m #Z n = ϕ(m)ϕ(n). In order to establish the formula for ϕ(n) recall that according to property 2 of Proposition 22.5 it is sufficient to compute ϕ(p e ) for a prime power p e. The only numbers less than p e that are not coprime to it are p,2p,3p,...,(p e 1 1)p. There are p e 1 1 numbers like that in total, which means that ϕ(p e ) = (p e 1) (p e 1 1) = p e p e 1. Now that we know the formula for ϕ(p e ) when p e is a prime power, it is straightforward to write down the general formula for ϕ(n) because it is multiplicative. 89

91 Proposition The number of divisors function τ(n) is multiplicative. Furthermore, if n = p e 1 1 pe 2 2 pe k k is the prime factorization of n, then σ(n) = (e 1 + 1)(e 2 + 1) (e k + 1). Proof. To see that τ(n) is multiplicative, let n 2 be an integer and consider the prime factorization of n: n = p e 1 1 pe 2 2 pe k k. Then every divisor d of n must be of the form d = p f 1 1 p f 2 2 p f k k, where 0 f i e i for all i = 1,2,...,k. Each f i has e i + 1 possibilities, so we see that there are exactly possible divisors of n. Now suppose that τ(n) = (e 1 + 1)(e 2 + 2) (e k + 1) m = p e 1 1 pe 2 2 pe k k and n = q f 1 1 q f 2 2 qf l l are coprime, i.e. the prime numbers p 1, p 2,..., p k,q 1,q 2,...,q l are distinct. Then τ(mn) = (e 1 + 1)(e 2 + 1) (e k + 1)( f 1 + 1)( f 2 + 1) ( f l + 1) = τ(m)τ(n), which means that τ(n) is a multiplicative function. Proposition The sum of divisors function σ(n) is multiplicative. Furthermore, if n = p e 1 1 pe 2 2 pe k k is the prime factorization of n, then σ(n) = ( p e p 1 1 )( p e 2+1 ) ( 2 1 p e k +1 p 2 1 Proof. To see that σ(n) is multiplicative, note that σ(n) = d = i(d), d n d n d>0 d>0 k 1 p k 1 ). 90

92 where i(n) = n is the identity function. Since the identity function i(n) is multiplicative, it follows from property 3 of Proposition 22.5 that σ(n) is multiplicative as well. In order to establish the formula for σ(n) recall that according to property 2 of Proposition 22.5 it is sufficient to compute σ(p e ) for a prime power p e. The divisors of p e are 1, p, p 2,..., p e, so σ(p e ) = 1 + p + p p e = pe+1 1 p 1. Note that the last equality holds because the sequence 1, p,..., p e constitutes an (e + 1)-term geometric progression with the first element equal to 1 and common ratio p. Now that we know the formula for σ(p e ) when p e is a prime power, it is straightforward to write down the general formula for σ(n) because it is multiplicative. 23 The Möbius Inversion From now on, when writing d n, we will always assume that the divisor d is positive. As we shall see, the Möbius function 1, if n = 1; µ(n) = 0, if n is not squarefree; ( 1) k, if n is squarefree with k distinct prime factors plays a crucial role in analytic number theory. Proposition For every n 1, Proof. Let g(n) = d n µ(n). Note that µ(d) = I(n). d n g(1) = µ(1) = 1 = I(1). 33 Proposition 8.6 in Frank Zorzitto, A Taste of Number Theory. 91

93 Now let n 2. Since µ(n) is multiplicative, it follows from property 3 of Proposition 22.5 that g(n) is multiplicative as well. By property 2 of Proposition 22.5, it suffices to check that g(p e ) = 0 for every prime power p e. We have so the result follows. g(p e ) = d p e µ(d) = µ(1) + µ(p) + µ(p 2 ) µ(p e ) = = 0 = I(p e ), The Möbius function is important because it allows to express the function f in terms of g whenever these two functions are connected by the relation g(n) = f (d). d n The operation of expressing f through g is called the Möbius inversion. Proposition If f and g are arbitrary functions, not necessarily multiplicative, that are defined on the set of positive integers and satisfy g(n) = f (d) d n for all n 1, then ( n ) f (n) = g(d)µ d d n ( n ) = g µ(d). d d n Proof. First, note that for a positive integer n and a pair of positive integers d,e it is the case that de n if and only if d n and e n/d. 34 Theorem 8.7 in Frank Zorzitto, A Taste of Number Theory. 92

94 Second, note that ( n ) g d d n µ(d) = d n e n d f (e) µ(d) = d n,e d n f (e)µ(d) = f (e)µ(d) ed n = f (e)µ(d) e n,d n e = e n = I e n = f (n). d n e µ(d) f (e) ( n ) f (d) e Before proceeding to examples of the Möbius inversion, let us prove the following fact about the Euler totient function ϕ(n). Proposition For every positive integer n, ϕ(n) = n. d n Proof. Let g(n) = d n ϕ(d). By property 3 of Proposition 22.5, the function g(n) is multiplicative. Therefore, by property 2 of Proposition 22.5, it is sufficient to understand its values g(p e ) for prime powers p e. Using the formula given in Proposition 22.6, we obtain g(p e ) = ϕ(1) + ϕ(p) + ϕ(p 2 ) ϕ(p e ) = 1 + (p 1) + (p 2 p) (p e p e 1 ) = p e. 35 Proposition 8.4 in Frank Zorzitto, A Taste of Number Theory. 93

95 And now, since g(n) is multiplicative, for any integer n with the prime factorization n = p e 1 1 pe 2 2 pe k k we may conclude that g(n) = g(p e 1 1 pe 2 2 pe k k ) = g(pe 1 1 )g(pe 2 2 ) g(pe k k ) = pe 1 1 pe 2 2 pe k k = n. Now that we established the connection between the identity function i(n) and the Euler totient function ϕ(n), we can write down a new formula for ϕ(n) via the Möbius inversion. Example Let us prove that for every positive integer n it is the case that µ(d) ϕ(n) = n d d n By Proposition 23.3, the identity function i(n) and the Euler totient function ϕ(n) are connected by means of the relation i(n) = ϕ(d). d n Now the Möbius inversion formula tells us that Example Note that ( n ) ϕ(n) = µ(d)i d d n = µ(d) n d = n µ(d) d. d n d n σ(n) = d, d n which means that there is a connection between the sum of divisors function σ(n) and the identity function i(n). But then it follows from the Möbius inversion formula that ( n ) n = µ(d)σ. d d n Exercise The von Mangoldt function, denoted by Λ(n), is defined as { log p, if n = p k for some prime p and integer k 1; Λ(n) = 0, otherwise. 94

96 Prove that logn = Λ(d), d n and then use the Möbius inversion to establish the formula Λ(n) = µ(d)logd. d n 24 The Prime Number Theorem In 1797 or 1798, it has been conjectured by Legendre that the number of primes x up to x is approximated by the function Alogx+B, where A and B are constants left unspecified. According to the recollections of Gauss, in the year 1792 or 1793, when he was 15 or 16 years old, he made a similar observation. In simple terms, this conjecture states that, up to x, there are roughly logx x prime numbers. The Prime Number Theorem is a theorem which confirms the conjecture made by Legendre and Gauss. It is one of the most renowned results in Analytic Number Theory. The Prime Number Theorem was proved independently by Jacques Hadamard and Charles Jean de la Vallée-Poussin in Theorem (The Prime Number Theorem) Let Then π(x) := #{p x: p is prime}. π(x) lim x x = 1. logx A more accurate statement of the Prime Number Theorem is the following one: ( ) π(x) = Li(x) + O xe a logx, where a is a positive constant and Li(x) = x 2 dt logt. Indeed, the function Li(x) describes the behaviour of the prime counting function more precisely than. In this form, we also see the error term, which tells x logx 95

97 us how far is the value of π(x) from the value of Li(x). Since the proof was introduced, the error term was improved many times. But most importantly it is widely believed that the error term can be as low as O( xlogx) if the so-called Riemann Hypothesis is true. The proof of the Prime Number Theorem requires some delicate analysis of the zeros of the Riemann zeta function ζ (s) := n=1 where s is a complex number with Re(s) > 1. The Riemann Hypothesis concerns the distribution of the zeros of ζ (s). It is undoubtedly one of the hardest open mathematical problems. At the University of Waterloo, there are several experts which work in the area of Analytic Number Theory and on the problems related to the distribution of zeta zeros, including Yu-Ru Liu and Michael Rubinstein. Of course, even before the proof of the Prime Number Theorem was discovered, It is worthwhile mentioning that there is a very interesting elementary argument related to the Prime Number Theorem given by Erdős. Though it does not provide an asymptotic formula for π(x), it does allow to explain why the function x logx captures the behaviour of π(x). The proof does not involve any analytic techniques and should be quite accessible to second or third year undergraduate students in mathematics. To those who are interested in the subject, we recommend this proof for further reading. 1 n s, Theorem (Erdős, 1949) For x 2, ( ) 3log2 x x < π(x) < (6log2) 8 logx logx. Proof. See Theorem 4 in ca.pure-mathematics/files/uploads/files/pmath440notes_0.pdf. 25 The Density of Squarefree Numbers In this section, we will see one basic analytical result on the density of squarefree numbers. 96

98 Theorem Let Q(x) = #{n x: n 2 is squarefree}. Then the natural asymptotic density of squarefree numbers is given by Q(x) lim = 6 x x π In other words, Theorem 25.1 tells us that over 60% of all positive integers are squarefree. Before proceeding to the proof, let us establish the following simple lemma. Lemma Let f (n) be a multiplicative function such that the series converges. Then f (n) n=1 f (n) = ( 1 + f (p) + f (p 2 ) +... ). n=1 p is prime Proof. For a fixed positive number y, the following identity holds: p is prime p<y (1 + f (p) + f (p 2 ) +...) = n if p n then p < y f (n). As y approaches infinity, the right hand side approaches n=1 f (n), while the left hand side approaches the desired Euler product. Since the series n=1 f (n) converges, it must be the case that f (n) 0 n y as y approaches infinity. We can utilize this fact in order to show that, as y approaches infinity, f (n) f (n) = f (n) n=1 n f (n) 0. n y n if p n then p < y 97 p n: p y

99 This observation allows us to conclude that f (n) = lim y n=1 n if p n then p < y f (n) = lim y p is prime p<y = p is prime (1 + f (p) + f (p 2 ) +...) ( 1 + f (p) + f (p 2 ) +... ). Proof. (of Theorem 25.1) Note that { µ 2 1, if n is squarefree; (n) = 0, otherwise, which means that Q(x) = µ 2 (n). n x Let l(n) denote the largest integer such that l(n) 2 n. Then it follows from Proposition 22.8 that { µ 2 1, if l(n) = 1; (n) = 0, otherwise; = I (l(n)) = µ(d) d l(n) = µ(d). d d 2 n As it turns out, this formula is much easier to analyze than µ 2 (n). Now let {x} := x x denote the fractional part of x. Note that {x} satisfies 98

100 0 {x} < 1 for any x. Then Q(x) = µ 2 (n) n x = n x d d 2 n µ(d) = µ(d) 1 d x n x d 2 n x = µ(d) d d x 2 = µ(d) d x ( x d 2 { x d 2 }) = µ(d) x d d x 2 µ(d) d x Since µ(d){x/d} < 1, we conclude that Now observe that d> x Q(x) = µ(d) x d x µ(d) d 2 d 2 d x µ(d) < x d d x d x = x x d x d=1 d> x µ(d) d 2 µ(d) d 2 + x x 1 d 2 < d> x x { x d 2 }. { x } µ(d) d 2 µ(d) d 2 + x. dt t 2 = 1 x 2. x Above we utilized the fact that x 2 x for all x 2. For convenience, define 99

101 the constant c as Then c := Q(x) cx x µ(d) d=1 d 2. d> x µ(d) d 2 + x < cx + x 2 x + x = cx + 3 x. Through analogous observations, we can also establish the lower bound on Q(x), and obtain the final relation cx 3 x < Q(x) < cx + 3 x. Now the only thing that is left for us to do is to compute c. Recall that n=1 1 n 2 = π2 6. This result was proved by Leonhard Euler in Further, by the argument analogous to the second proof of Theorem 2.10, we see that π 2 6 = n=1 1 n 2 = (1 + 1p p is prime 2 + 1p ) ( = 1 1 ) 1 p is prime p 2. Note that the last equality holds due to the formula for the infinite geometric series. Since the function µ(n)/n 2 is multiplicative and d=1 µ(d) d 2 d=1 1 d 2 = π2 6 <, 100

102 we can apply Lemma 25.2 to the series d=1 µ(d)/d2 in order to obtain c = Thus we conclude that and further µ(d) d=1 d 2 = p is prime = = p is prime ) 1 ( d=1 = 6 π 2. 1 n 2 ( 1 + µ(p) p 2 + µ(p2 ) (1 1p ) 2 6 π 2 x 3 x < Q(x) < 6 π 2 x + 3 x, 6 π 2 3 < Q(x) < 6 x x π x ) p By letting x tend to infinity, we see that the Squeeze Theorem implies Q(x) lim = 6 x x π Perfect Numbers One of the oldest problems in mathematics concerns the existence of odd perfect numbers. Around 300BC, these numbers were introduced by Euclid in his book Elements (VII.22). Definition A positive integer n is called perfect if the sum of its divisors is equal to 2n, or in other words σ(n) = 2n. The first eight perfect numbers are 6, 28, 496, 8128, , , ,

103 Aside from the fact that they tend to grow pretty quickly (which we shall explain later), we may notice one thing that they all have in common, namely that they are all even. But do there exist odd perfect numbers? We do not know. This question was studied thoroughly over the past two centuries, and quite a few things are known about odd perfect numbers. For example, if an odd perfect number n exist, it must satisfy the following three (out of many other) criteria: 1. n > ; 2. n has at least 101 prime factors and at least 10 distinct prime factors; 3. The largest prime factor of n is greater than In 2003, Carl Pomerance gave a heuristic argument why the existence of odd perfect numbers is highly unlikely. Those who are interested can find his argument here: Unlike odd perfect numbers, we do know that even perfect numbers exist. Even more than that, we know exactly how perfect numbers look like. However, we still do not know whether there are infinitely many even perfect numbers. As we shall see later, this problem is equivalent to showing that there are infinitely many Mersenne primes. Definition Let M n := 2 n 1. An integer M p = 2 p 1 is called a Mersenne prime if it is prime. The first eight Mersenne primes are 3, 7, 31, 127, 8191, , , As we will see in the proof of Euclid-Euler Theorem, which was proved by Leonhard Euler in 1747, the even perfect numbers and Mersenne primes are closely related. Theorem (Euclid-Euler Theorem, 1747) 36 An even positive integer n is a perfect number if and only if it has the form n = 2 p 1 M p, where M p is a Mersenne prime. Proof. The sufficient condition was proved by Euclid around 300 BC. You are asked to reproduce his proof in Assignment 5, so we omit it in these lecture notes. 36 Theorem 8.5 in Frank Zorzitto, A Taste of Number Theory. 102

104 For the necessary condition, suppose that n is even and perfect. Let us write n = 2 p 1 m, where p 2 and m is odd. Note that p 2 because n is even. We will show that m = 2 p 1, and that m is prime. We have that n is perfect, and so σ(n) = 2n = 2 p m. Because 2 p 1 and m are coprime and σ is multiplicative, the first equation yields σ(n) = σ(2 p 1 )σ(m). By adding up the divisors of 2 p 1 we obtain We conclude that and so σ(2 p 1 ) = p 1 = 2 p 1. σ(n) = (2 p 1)σ(m), 2 p m = (2 p 1)σ(m). Since 2 p and 2 p 1 are coprime, 2 p 1 m. So m = (2 p 1)d for some positive integer d. Now we need to prove that in the expression m = (2 p 1)d we have d = 1. We plug in this expression into the equality 2 p m = (2 p 1)σ(m) in order to obtain 2 p (2 p 1)d = (2 p 1)σ(m), and thus 2 p d = σ(m). From m = (2 p 1)d and 2 p d = σ(m) we come to m + d = 2 p d = σ(m). Now suppose that d > 1. Since d < m, there are at least three divisors of m, namely 1, d and m. So σ(m) m + d + 1, and this contradicts the fact that σ(m) = m + d. Therefore d = 1. To see that m is prime, note that σ(m) = m+d = m+1. Since the divisors of m add up to m+1, our m can have only 1 and m as divisors, which makes m a prime. Hence our perfect even number m is of the form 2 p 1 M p, where M p = 2 p 1 is a Mersenne prime. 103

105 Though we do not know if there are infinitely many Mersenne primes, we do know quite a few of them. On January 7th 2016, The Great Internet Mersenne Prime Search reported the discovery of the 49th Mersenne prime, which is the largest Mersenne prime known to date. This prime is M , and it has decimal digits. If you want to make some significant impact to Computational Number Theory, try to search for other Mersenne primes! 27 Pythagorean Triples In Section 4, we learned how to solve the linear Diophantine equation ax+by = c. We will now turn our attention to equations of degree two or more. The analysis of such equations can be much more challenging, and many Diophantine equations, such as Thue equations, remain the objects of active research nowadays. In this section, we will classify all positive integer solutions to the Pythagorean equation x 2 + y 2 = z 2. Note that if the integers x, y and z satisfy the above equation, then so do integers dx, dy and dz for any integer d. Thus it is only interesting to consider the case when gcd(x,y,z) = 1. In this case, we call the triple of solutions primitive. The first three primitive solutions to the Pythagorean equation are (x, y, z) = (3, 4, 5), (5, 12, 13) and (8, 15, 17). Theorem Suppose integers x, y and z satisfy the Pythagorean equation x 2 + y 2 = z 2. Then there exist integers d,m,n such that x = d(n 2 m 2 ), y = 2dmn, z = d(n 2 + m 2 ). Proof. 37 Let d = gcd(x,y,z). Then the triple (x/d,y/d,z/d) is also a solution, so without loss of generality we may assume that gcd(x,y,z) = 1, i.e. (x,y,z) is a primitive solution. From here it follows that either x or y have different parity, for if we assume that both x and y are odd, then x 2 +y 2 2 (mod 4), which contradicts the fact that z 2 0,1 (mod 4) for any integer z. Without loss of generality, we may assume that x is odd and y is even, which means that z is odd. Now we write y 2 = z 2 x 2 = (z x)(z + x). 37 The proof is taken from Section 1.1 of M. J. Jacobson, Jr. and H. C. Williams, Solving the Pell Equation,

106 If we let g = gcd(z x,z + x) = gcd(2z,z + x) = gcd(z x,2x) (see Proposition 5.1), then g 2z and g 2x, which means that g gcd(2z, 2x) = 2 gcd(z, x). Since (x,y,z) is a primitive solution, it must be the case that gcd(z,x) = 1. This means that g 2, and since x and z are odd it must be the case that g = 2. Now we can write ( y 2 ) 2 = ( z x 2 )( z + x Since the value on the left hand side of the above equality is a perfect square and z x 2, z+x 2 are coprime integers, it must be the case that z x 2 and z+x 2 are perfect squares. Put z x = m 2 and z + x = n But then x = n 2 m 2, y = 2mn and z = n 2 + m 2. Now we see that for any integer d the identity ( d(n 2 m 2 ) ) 2 + (2dmn) 2 = ( d(n 2 + m 2 ) ) 2 holds, which means that all solutions (x,y,z) to x 2 + y 2 = z 2 are of the form (d(n 2 m 2 ),2dmn,d(n 2 + m 2 )), as claimed. 2 ). 28 Fermat s Infinite Descent. Fermat s Last Theorem Perhaps, the most famous mathematical story is the story of Fermat s Last Theorem. Around 1637, Fermat wrote his Last Theorem in the margin of his copy of Diophantus s Arithmetica. When reformulated, his claim sounds as follows: Theorem (Fermat s Last Theorem) Let n 3. Then the equation x n +y n = z n has no solutions in positive integers x, y and z. He claimed to discover a truly marvellous proof of this fact, but couldn t write it because the margin of the book which he was reading was too narrow to contain all of the proof. Many mathematicians tried to establish the proof of Fermat s Last Theorem. The case n = 4 was proved by Fermat himself in In 1753, Euler proved it for the case n = 3. Alternative proofs were given by Kausler, Legendre, Calzolari, Lamé, and many others. In his proof, Euler utilized Fermat s idea of infinite 105

107 descent, which we shall discuss in this section. The case n = 5 was proved by Dirichlet and Legendre around 1825, and alternative proofs were given by Gauss, Lebesgue, Lamé, and others. The case n = 7 was proved by Gabriel Lamé in In the 1820 s, Sophie German developed an approach to attack the problem for several exponents at the same time. In particular, she managed to show that the Fermat s Last Theorem holds for all primes n < 100. In 1847, Gabriel Lamé suggested to approach the problem by factoring the equation x p + y p = z p for odd prime p as follows: z p = x p + y p = (x + y)(x + ζ p y)(x + ζ 2 py) (x + ζ p 1 p y), (5) where ζ p = exp(2πi/p) is the primitive p-th root of unity. If instead of the standard ring of integers Z one considers the ring of integers Z[ζ p ] = {x 0 + x 1 ζ p + x 2 ζp ζp p 1 : x 1,x 2,...,x p Z}, then one would hope that such notions as unique factorization or coprimality take place in Z[ζ p ], just like they do in Z. Assuming that this is the case, one could show that the algebraic integers x+y,x+ζ p y,...,x+ζ p p 1 y are coprime, and since the expression (5) has a p-th power of an integer z on its left hand side, one could then hope that x + ζpy i = q p i for some q i Z[ζ p ], where i = 0,1,..., p. In other words, each of the numbers x + y,x + ζ p y,...,x + ζp p 1 y are perfect p-th powers, and one could prove that this is impossible. Note how similar this idea to the one presented in the proof of Theorem Unfortunately, there is a flaw in this argument: it is not necessarily true that the ring Z[α] for some algebraic number α has the unique factorization. Perhaps, the most famous example is that in the ring Z[ 5] = {x 1 + x 2 5: x1,x 2 Z} one can write the number 6 in two different ways: 6 = 2 3 = (1 + 5)(1 5). The odd primes p such that the elements of the ring Z[ζ p ] may not possess the unique factorization are called irregular primes. They are called regular otherwise. The first eight irregular primes are 37,59,67,101,103,131,

108 Therefore Lamé s strategy applies to all primes p < 100, except for p = 37, 59 and 67. Around 1850, Ernst Kummer managed to prove that for all regular primes the Fermat equation x p +y p = z p has no solutions in positive integers when p is an odd prime. However, it is still unknown whether there are infinitely many regular primes. In 1964, Carl Ludwig Siegel conjectured that approximately 60.65% of all prime numbers are regular. The techniques suggest by Lamé and Kummer (and Euler before that) evolved into a whole new area of mathematics, known nowadays as the Algebraic Number Theory. The next few sections will contain a brief introduction to this subject. The Fermat s Last Theorem was proved by the English mathematician Andrew Wiles. His proof was published in 1994 in the special issue of Annals of Mathematics. The original paper is available here: edu/~lekheng/flt/wiles.pdf. As an exercise: try to understand at least the first page! Since the Fields medal, which is one of the most important awards for mathematicians, is restricted to those under age 40, and Andrew Wiles proved the Fermat s Last Theorem at the age 41, he received a silver plaque from the International Mathematical Union instead of the Fields medal. The proof of Andrew Wiles combined many areas of number theory together. It is an interconnection of the Theory of Elliptic Curves, Theory of Modular Forms, Representation Theory, Iwasawa Theory, and many other mathematical subjects. In short, Andrew Wiles managed to do the following. Consider the equation y 2 = x 3 + ax + b, where a and b are complex numbers such that 4a b 2 0. When a and b are real, such an equation defines a plane curve, called an elliptic curve. In 1985, the German mathematician Gerhard Frey pointed out that for an integer n 3 the elliptic curve y 2 = x(x a n )(x + b n ), where a and b are positive integers such that a n +b n = c n for some integer c, must be very special. In particular, he pointed out that such a curve must be semistable and non-modular. The fact that it is non-modular would then contradict the socalled Taniyama-Shimura Conjecture, proposed by the Japanese mathematicians Yutaka Taniyama and Goro Shimura in The conjecture stated that every elliptic curve, semistable or not, has to be modular. Andrew Wiles managed to prove this conjecture in the semistable case. Fermat s Last Theorem then follows from this result. The fact that all elliptic curves, semistable or not, are modular, was proved in 2001 by Christophe Breuil, Brian Conrad, Fred Diamond and 107

109 Richard Taylor. This result is known as the Modularity Theorem. It took more than 350 years for the proof of Fermat s Last Theorem to be discovered. Fermat claimed that he had the proof of the Fermat s Last Theorem. Of course, it is highly unlikely that the argument he had in mind was as involved as the one given by Andrew Wiles. Most likely, Fermat believed that the theorem could be proved using the technique of infinite descent, which he developed. This technique allowed him to prove the theorem in the special case when n = 4. We present a more general result in the following proposition. The idea of infinite descent can be summarized as follows: when considering certain Diophantine equations, like x 3 + 2y 3 + 4z 3 = 0 or x 4 + y 4 = z 2, one can show that the existence of one solution leads to the existence of another solution, which is smaller than the previous one. One would then obtain an infinite strictly decreasing sequence of positive integers x 1 > x 2 > x 3 >..., which would contradict the fact that the natural numbers are bounded below by 1. We will demonstrate the application of this technique in two special cases. More examples can be found in the following survey of Keith Konrad: blurbs/ugradnumthy/descent.pdf. Proposition (Fermat, 1636) 38 The equation x 4 +y 4 = z 2 has no solutions in positive integers x, y and z. Proof. By Theorem 27.1, every primitive solution (x,y,z) to the equation x 2 + y 2 = z 2 must be of the form x = n 2 m 2, y = 2mn, z = n 2 + m 2. Assume that there is a solution to x 4 + y 4 = z 2, where x, y and z are positive integers. Without loss of generality, we may suppose that gcd(x, y) = 1, which means that gcd(x,z) = 1 and gcd(y,z) = 1. We will find a second positive integer solution (x,y,z ) with gcd(x,y ) = 1 that is smaller than (x,y,z) in a suitable sense. Since x 4 + y 4 = z 2 and gcd(x,y) = 1, at least one of x and y is odd. Otherwise, z 2 2 (mod 4), and this congruence as we saw before has no solutions. Without loss of generality, we may assume that x is odd and y is even. Then z is odd. Since (x 2 ) 2 + (y 2 ) 2 = z 2, the triple (x 2,y 2,z) must be a primitive Pythagorean triple, so there exist integers m and n such that x 2 = n 2 m 2, y 2 = 2mn, z = n 2 + m 2. (6) 38 Theorem 3.1 in Keith Conrad, Proofs by descent. 108

110 Since x 2 + m 2 = n 2 and gcd(m,n) = 1, we conclude that (x,m,n) is another primitive Pythagorean triple. Since x is odd, the formula for primitive Pythagorean triples once again tells us that x = a 2 b 2, m = 2ab, n = a 2 + b 2, (7) where a and b are positive. Substituting the values of m and n in (7) into the second equation of (6), we obtain y 2 = 4(a 2 + b 2 )ab. Since y is even, ( y 2 ) 2 = (a 2 + b 2 )ab. Since gcd(a,b) = 1, the three factors on the right are pairwise coprime. Since they are all positive, each of them must be a perfect square: a = x 2, b = y 2, a 2 + b 2 = z 2. Since gcd(a,b) = 1, it must be the case that gcd(x,y ) = 1. Now the last equation can be rewritten as x 4 + y 4 = z 2, so (x,y,z ) is another solution to our original equation with gcd(x,y ) = 1. Now we compare z to z. Since 0 < z z 2 = a 2 + b 2 = n n 2 < z, we see that from one primitive solution (x,y,z) to x 4 + y 4 = z 2 we can produce another primitive solution (x,y,z ) such that z > z. But then we could produce an infinite strictly decreasing sequence of positive integers z > z > z >..., and this contradicts the fact that the positive integers are bounded below by 1. Corollary The Fermat s Last Theorem holds for n = 4. In other words, the equation x 4 + y 4 = z 4 has no solutions in positive integers x, y and z. Another example of the proof by infinite descent is the proof of irrationality of 2. This proof was discovered by Pythagoreans, who showed that the diagonal of a square cannot be represented as a ratio of two integers. The Pythagoreans kept the proof of this fact as a secret and, according to the legend, its discoverer (possibly Hippasus of Metapontum) was murdered for divulging it. Proposition The number 2 is irrational. That is, there exist no integers m and n such that 2 = m/n. 109

111 Proof. Suppose not and there exist positive integers m and n such that 2 = m/n. Then m = 2n. Raising both sides of this equation to the power of two, we obtain m 2 = 2n 2, so (m,n) is a positive solution to the Diophantine equation x 2 = 2y 2. From the above equality we see that 2 m 2, which means that 2 m. But then we can write m as m = 2m for some integer m. Therefore m 2 = (2m ) 2 = 4m 2 = 2n 2. Thus we obtain 4m 2 = 2n 2, and by cancelling 2 on both sides we get n 2 = 2m 2. Thus from the positive integer solution (m, n) we can obtain another positive integer solution (n,m ) to the Diophantine equation x 2 = 2y 2. Note that m = 1 2 m = 1 2 n < n, so the second coordinate in the solution (m,n) is strictly greater than the second coordinate in the solution (n,m ). Thus, if there would be a positive integer solution to the Diophantine equation x 2 = 2y 2, we could produce an infinite strictly decreasing sequence of positive integers n > m > m >..., and this contradicts the fact that the positive integers are bounded below by 1. Exercise Let k be a positive integer. Prove that the number k is rational if and only if k is a perfect square. 29 Gaussian Integers Let i denote one of the complex roots of the polynomial x That is, the number i satisfies the equation i 2 = 1. Notice that if i is a root of x 2 + 1, then so is i. Definition A complex number of the form a + bi, where a,b Z is called a Gaussian integer. The set of Gaussian integers is denoted by Z[i]. The notation Z[i] suggests that the set of Gaussian integers is analogous to the ring of rational integers Z, where we now treat the numbers i or i as (Gaussian) integers as well. The similarities between the two sets become even more obvious once we note that, just like the set of rational integers Z, the set Z[i] forms a commutative ring under the standard operations of addition and multiplication. 110

112 Proposition The set of Gaussian integers Z[i] := {a + bi: a,b Z} forms a commutative ring under the standard operations of addition and multiplication. Proof. Strictly speaking, to prove this result one would have to do the routine verification of the ring axioms and the commutativity. We will leave this part an exercise. What is worthwhile mentioning is that both 0 = i and 1 = i are the elements of Z[i], and also that the operations of addition and multiplication are well-defined. That is, for all a + bi,c + di Z[i], their sum, difference and product are the elements of Z[i]: (a + bi) ± (c + di) = (a ± c) + (b ± d)i Z[i]; (a + bi)(c + di) = (ac bd) + (ad + bc)i Z[i]. Also, note that Z Z[i], so every rational integer is also a Gaussian integer. We will see that the Gaussian integers will be of a great help when we will try to answer the question which integers can be represented as a sum of two squares. In other words, we will use Gaussian integers to solve the Diophantine equation n = a 2 + b 2, where n is fixed and a,b Z are variables. Note that, if n N is representable as a sum of two squares, then n = a 2 + b 2 = (a + bi)(a bi), so we just managed to factor a rational integer n, which is also a Gaussian integer, as a product of two Gaussian integers a + bi and a bi. Definition Let a,b Z[i]. We say that a divides b, or that a is a factor of b, when b = ak for some k Z[i]. We write a b if this is the case, and a b otherwise. Example For example, 5 = (1 + 2i)(1 2i), so (1 + 2i) 5 and (1 2i) 5. One of the most important invariants attached to a Gaussian integer z is its norm, which we denote by N(z). 111

113 Definition The norm function is defined to be N : Z[i] N {0}, a + bi a 2 + b 2. Definition Let z = a + bi be a Gaussian integer. The complex conjugate of z is z = a bi. The absolute value of z is z := zz = (a + ib)(a ib) = a 2 + b 2. Note the obvious connection between the norm of a Gaussian integer z = a+bi and the absolute value of z: ( ) 2 N(z) = N(a + bi) = a 2 + b 2 = a 2 + b 2 = z 2. The norm map has many nice properties. For example, it is multiplicative; that is, the norm of the product of two Gaussian integers is equal to the product of their norms: N(zw) = zw 2 = ( z w ) 2 = z 2 w 2 = N(z)N(w). We will see the usefulness of this property later. Another important thing to mention is that the only Gaussian integer whose norm is equal to zero is zero itself. That is, N(z) = 0 if and only if z = 0. Now comes the time to speak about the geometric interpretation of the Gaussian integers. Consider Figure 1, 39 which depicts a complex plane. The Gaussian integers a + bi form a square grid located at points (a,b), where the coordinates are rational integers. If z = a + bi is a Gaussian integer, then the point (a, b), which corresponds to the complex conjugate z = a bi of z, is just the result of reflection of the point (a,b) along the x-axis. In turn, the absolute value z represents the distance from the point (a,b) to the origin. Note that it is equal to the distance from the point (a, b) to the origin. The next important concept that we need to introduce is the concept of units. Definition A Gaussian integer u is a unit of Z[i] when u w for all w in Z[i]. In other words, the units are those very special numbers that divide every single element of the ring. The notion of a unit does not apply only to the ring of Gaussian integers, but in fact applies to any algebraic ring. For example, in Z the only units are 1 and 1. When talking about the prime factorization of rational 39 The picture is taken from 7d/Gaussian_integer_lattice.png. 112

114 Figure 1: Gaussian integers integers, we always omit ±1. When doing so, we actually mean that the prime factorization is unique up to multiplication by ±1. We will see that the analogue of the Fundamental Theorem of Arithmetic holds for Gaussian integers, and so every Gaussian integer has the unique prime factorization up to multiplication by a unit. In the next proposition, we prove that the only units in the ring of Gaussian integers are ±1 and ±i. Proposition The following are equivalent: 1. z is a unit in Z[i]; 2. N(z) = 1; 3. z {1, 1,i, i}; 4. the inverse complex number z 1 := 1/z is also a Gaussian integer. Proof. Suppoze that z is a unit. Then z 1, since z divides every Gaussian integer. Thus 1 = zw for some w in Z[i]. Then 1 = = N(1) = N(zw) = N(z)N(w). Since N(z) and N(w) are positive integers, we deduce that N(z) = 1 (and N(w) = 1). 40 Proposition 7.9 in Frank Zorzitto, A Taste of Number Theory. 113

115 Suppose that N(z) = 1, where z = a+bi for some a,b Z. We have a 2 + b 2 = 1, which means that a 2 = 1, b = 0 or a = 0, b 2 = 1. In the first case, a = ±1 and b = 0, which means that z = ±1. In the second case, a = 0 and b = ±1, which means that z = ±i. If z is one of 1, 1,i, i, its inverse is 1, 1, i,i, respectively, and these are again Gaussian integers. Finally, suppose that z and z 1 are Gaussian integers. If w is any other Gaussian integer, we see that z w, because w = z(z 1 w) and z 1 w is a Gaussian integer. We will now turn our attention to establishing the analogue of the Fundamental Theorem of Arithmetic in the ring of Gaussian integers. For this purpose, we need to introduce the definition of a Gaussian prime. Definition Let z be a Gaussian integer. Then z is called a Gaussian prime if it is not a unit and any factorization z = wu in Z[i] forces w or u to be a unit. Compare this definition to Definition 2.5, where we introduced the notion of a rational prime. One can notice the similarities, since an ordinary rational prime can be factored in Z only if one of its factors is a unit, which in the case of Z are ±1. Example The integer 2 is a prime in Z, but it is not a Gaussian prime because 2 = (1 + i)(1 i), and neither 1 + i nor 1 i is a unit. The number 3, however, is not only a rational prime, but also a Gaussian prime. For suppose that 3 = zw for some Gaussian integers z and w. Then N(3) = 9 = N(zw) = N(z)N(w), which means that N(z) 9 and N(w) 9. If we assume that N(z) = 1, then z must be a unit by Proposition Thus we need to eliminate this case. But then N(z) = 3, and if we let z = a + bi, then 3 = N(z) = N(a + bi) = a 2 + b 2. However, we saw many times that integers congruent to 3 modulo 4 cannot be represented as a sum of two squares, which means that N(z) 3. Analogously, N(w) 3. But then either N(z) = 1 or N(w) = 1, which means that either z or w is a unit. 114

116 Exercise Prove that every rational prime p such that p 3 (mod 4) is a Gaussian prime. The next step is to establish the analogue of the Remainder Theorem for Gaussian integers. Proposition If z,w are Gaussian integers and z 0, then there exist Gaussian integers q and r such that w = qz + r, where N(r) < N(z). Proof. Recall the geometric interpretation of the Gaussian integers, given in Figure 1. The complex number w/z is located somewhere on the complex plane C. This w/z need not be a Gaussian integer. However, as Figure 2 demontrates, one can see that it falls into one of the rectangular areas, whose vertices are Gaussian integers. Figure 2: Complex number in a square with Gaussian integers as vertices We pick our Gaussian integer q so that the distance between the point corresponding to q and the point corresponding to w/z is the smallest. By inspection, we can see that such a Gaussian integer q must satisfy w z q 1. 2 The Gaussian integer q has to be in one of the four boxes as shown on Figure 3, and the diagonal of each box has length (1 ) 2 ( ) =

117 Figure 3: Gaussian integer closest to a given complex number We conclude that Therefore w z q 2 w zq z 1 2 < 1. 2 < 1, and so w zq 2 < z 2, which is the same as N(w zq) < N(z). Put r := w zq, and obtain w = zq + r, where N(r) < N(z). Example Let us see how the Remainder Theorem for Gaussian integers works. Let w = 4 + 7i and z = 1 3i. Then w z = 4 + 7i (4 + 7i)(1 + 3i) i = = = i. 1 3i (1 3i)(1 + 3i) 10 We see that the nearest integer point to ( 1.7,1.9) is ( 2,2). Thus q = 2 + 2i. Then r = w qz = (4 + 7i) (1 3i)( 2 + 2i) = I. We conclude that Note that N( i) = 1 < 10 = N(z) i = ( 2 + 2i)(1 3i) i. 41 Proposition 7.11 in Frank Zorzitto, A Taste of Number Theory. 116

118 We will now prove the analogue of Bézout s lemma for Gaussian integers. For a,b Z[i], we call an integer ax + by with x,y Z[i] a Gaussian combination of a and b. In the following proposition, it is crucial that for every Gaussian integer a the value N(a) is always non-negative. Proposition Let a,b be Gaussian integers such that a 0 or b 0. If d is a Gaussian combination of a and b such that N(d) is minimal, then d divides every combination of a and b. Proof. We know that ax + by = d and N(d) > 0 is minimal. Now consider some integer combination c = as + bt, where s,t Z[i]. We want to show that d c. By Proposition 29.12, c = dq + r for some q,r Z[i], where N(r) < N(d). Thus 0 r = c dq = as + bt (ax + by)q = a(s xq) + b(t yq). We see that r is an integer combination of a and b such that N(r) < N(d). Because d is the integer combination of a and b such that N(d) is minimal, the only option is that N(r) = 0. Hence d c. In particular, d a and d b, because a,b are integer combinations of a and b. Definition A Gaussian integer d = ax + by such that x,y are Gaussian integers, d a and d b is called a greatest common divisor of Gaussian integers a and b. Exercise Let d 1 and d 2 be greatest common divisors of Gaussian integers a and b. Prove that d 1 = ud 2 for some unit u in Z[i]. Finally, we prove the analogue of Euclid s lemma for Gaussian integers. Proposition if p is a Gaussian prime and p zw for some Gaussian integers z,w, then p z or p w. 42 Proposition 7.13 in Frank Zorzitto, A Taste of Number Theory. 117

119 Proof. Suppose that p z. We will show that p w. Let u be a greatest common divisor of p and z. Thus u = pt + zs for some t,s Z[i] and u p, u z. Write p = uk for some k Z[i]. Since p is a Gaussian prime, one of u or k is a unit in Z[i]. If k is a unit, then u = pk 1 Z[i], and so p u. Since p u and u z, it must be that p z, contrary to our assumption. Thus u is a unit with inverse u 1 Z[i]. Now multiply u = pt + zs by wu 1 : w = ptwu 1 + zswu 1. Clearly, p ptwu 1, and we are given that p zw. Thus p w. Exercise Compute the quotient and the remainder after division of w by z, when (w,z) = (6 + i,2 i),(27 5i,3 7i),(4 + 7i,8 i). Exercise Let ω denote the primitive third root of unity. That is, ω = e 2πi 3 = Note that ω satisfies the equation ω 2 + ω + 1 = 0. The set Z[ω] := {a + bω : a,b Z} is called the ring of Eisenstein integers. For any Eisenstein integer α = a + bω, where a,b Z, the norm map is defined by N(a + bω) := a 2 ab + b 2. (8) Just like the ring of Gaussian integers, the ring of Eisenstein integers is a Unique Factorization Domain. Geometrically, Eisenstein integers form a lattice on the complex plane (see Figure 4). 1. Prove that Z[ω] is a ring by showing that 0,1 Z[ω], and for all α,β Z[ω] it is the case that α ± β Z[ω] and α β Z[ω]; 2. Prove that the norm map defined in (8) is multiplicative. That is, for every α,β Z[ω] it is the case that N(αβ) = N(α)N(β). Explain why N(α) 0 for every α Z[ω] and why N(α) = 0 if and only if α = 0; 3. We say that υ Z[ω] is a unit if υ α for every α Z[ω]. Prove that υ Z[ω] is a unit if and only if N(υ) = 1; 118

120 4. Find all units in Z[ω]. Figure 4: Eisenstein integers Exercise Let Z[ 2] := { a + b } 2: a,b Z. We say that υ Z[ 2] is a unit if υ α for every α Z[ 2]. Prove that there are infinitely many units in Z[ 2]. Hint: Consider the Pell equation x 2 2y 2 = ±1. Explain why, for every (x 1,y 1 ) satisfying this Diophantine equation, the value x 1 + y 1 2 is a unit in Z[ 2]. Find any solution (x 1,y 1 ), and then prove that, for every positive inte- ger n, the integer coefficients x n and y n of the number x n + y n 2 := (x1 + y 1 2) n also satisfy the equation xn 2 2y 2 n = ±1. Exercise Consider the ring Z[ 13] = {a + b 13: a,b Z}. For every a,b Z, the norm map on Z[ 13] is defined by N(a + b 13) := a b 2. You may assume that the norm is multiplicative. We will show that the unique factorization fails in Z[ 13]. To solve this problem, you might want to refer to Section 2.3 in Frank Zorzitto, A Taste of Number Theory. 1. Prove that the only units of Z[ 13] are ±1. Hint: Let υ = a + b 13 for a,b Z. By definition, υ Z[ 13] is a unit if υ α for every α Z[ 13]. Thus, in particular, υ 1. Explain why this fact implies the equality a b 2 = 1. What are the solutions to this Diophantine equation? 119

121 2. We say that a non-zero number γ Z[ 13] is prime if the factorization γ = αβ for α,β Z[ 13] implies that either α is a unit or β is a unit. Prove that the numbers 2,7, and 1 13 are prime in Z[ 13]; 3. Using Part (b), explain why the unique factorization fails in Z[ 13]. 30 Fermat s Theorem on Sums of Two Squares We will now turn our attention to the Diophantine equation n = a 2 + b 2, where n is a fixed positive integer and a,b are integer variables. On December 25th 1640, Fermat sent the proof of the following theorem to Mersenne, which is why in some sources it is called Fermat s Christmas Theorem. This theorem will allow us to explain which positive integers are representable as a sum of two squares, and how many solutions does the equation n = a 2 + b 2 have. Theorem (Fermat s Theorem on Sums of Two Squares) 43 If p is a rational odd prime and p 1 (mod 4), then p = a 2 + b 2 for some rational integers a and b. Proof. (Richard Dedekind, circa 1894) Since p 1 (mod 4), it follows from Corollary that 1 is a quadratic residue modulo 4. Thus 1 x 2 (mod p) for some rational integer x. Thus p x in Z, and so p (x + i)(x i) in Z[i]. Now note that p x + i, for if we assume that x + i = p(c + di) for some Gaussian integer c + di, then by equating the imaginary parts we get pd = 1, which contradicts the fact that p 1. Likewise, p x i. Since p divides a product without dividing either of the factors, Proposition tells us that p is not a Gaussian prime. Thus p = uv, where u,v Z[i] are not units. But then p 2 = N(p) = N(uv) = N(u)N(v), so N(u) = 1, p or p 2.If N(u) = 1, then u is a unit. If N(u) = p 2, then N(v) = 1, so v is a unit. Hence N(u) = N(v) = p. But if we now write u = a + bi, then p = N(u) = N(a + bi) = a 2 + b 2, so p is a sum of two squares of rational integers. 43 Theorem 7.14 in Frank Zorzitto, A Taste of Number Theory. 120

122 Now we know that, when p is an odd prime, the equation p = x 2 + y 2 has a solution in positive integers x and y if and only if p 1 (mod 4). Notice that it also has a solution when p = 2, because 2 = We would now like to generalize this result to all positive integers n. For this purpose, we need to prove the following lemma. Lemma If p in Z[i] is a Gaussian prime and p k uv for some Gaussian integers u and v and exponent k 1, then there are exponents j,l = 0,1,...,k such that p j u, p l v and j + l = k. Proof. We will prove this statement using the principle of mathematical induction. Base case. For k = 1, the result is equivalent to Euclid s lemma for Gaussian integers, stated in Proposition Induction hypothesis. Suppose that the theorem is true for k 1. Induction step. Let p k uv. Then p u or p v. Suppose that p v. Write v = wp for some w in Z[i]. Then p k uwp, which means that p k 1 uw. According to the induction hypothesis, there exist integers j and m, 0 j,m n 1, such that p j u, p m w, and j + m = k 1. But then p m+1 wp = v. If we now put l = m + 1, then p j u, p l v, and j + l = k, as claimed. Proposition Let n be a positive integer. The Diophantine equation n = x 2 + y 2 has a solution if and only if n has the prime factorization n = 2 t p e 1 1 pe 2 2 pe k k q2 f 1 1 q2 f 2 2 q 2 f l l, where p j 1 (mod 4) for all j = 1,2,...,k and q j 3 (mod 4) for all j = 1,2,...,l. Proof. Let w = a + bi and z = c + di be Gaussian integers. Since the norm map is multiplicative, it is the case that (a 2 + b 2 )(c 2 + d 2 ) = N(w)N(z) = N(wz) = N ((ac bd) + (ad + bc)i) = (ac bd) 2 + (ad + bc) 2. The identity above allows us to conclude that the product mn of any two numbers m = a 2 +b 2 and n = c 2 +d 2 will be representable as a sum of two squares as well: mn = (a 2 + b 2 )(c 2 + d 2 ) = (ac bd) 2 + (ad + bc) Proposition 7.16 in Frank Zorzitto, A Taste of Number Theory. 121

123 Since 2 is representable as a sum of two squares, as well as any odd prime p 1 (mod 4), we conclude that every integer n with the prime factorization n = 2 t p e 1 1 pe 2 2 pe k k, where p j 1 (mod 4) for all j = 1,2,...,k is representable as a sum of two squares. We know that for every rational prime q 3 (mod 4) the Diophantine equation q 2 f +1 = a 2 + b 2 has no solutions for every non-negative integer f, because q 2 f +1 3 (mod 4). However, every even power of q is representable as a sum of two squares, because q 2 f = (q f ) for every positive integer f. But then once again we can use the identity (a 2 + b 2 )(c 2 + d 2 ) = (ac bd) 2 + (ad + bc) 2 to conclude that every integer n with the prime factorization n = 2 t p e 1 1 pe 2 2 pe k k q2 f 1 1 q2 f 2 2 q 2 f l l, where p j 1 (mod 4) for all i = 1,2,...,k and q j 3 (mod 4) for all j = 1,2,...,l is representable as a sum of two squares. We will now show that these are the only numbers representable as a sum of two squares. To prove this fact, all that we have to do is to show that, whenever n = x 2 + y 2 and some prime q 3 (mod 4) satisfies n = q k m, where m is an integer such that q m, then the exponent k has to be even. We see that q k x 2 + y 2 = (x + yi)(x yi). Since every rational prime q 3 (mod 4) is also a Gaussian and prime, it follows from Lemma 30.2 that there exist integers j and l, 0 j,l k, such that j +l = k, q j (x + yi) and q l (x yi). Suppose that j l. Then x + yi = q j (c + di) for some integers c and d. Therefore x + yi = q j c + q j di, which means that x = q j c and y = q j d. But then n = x 2 + y 2 = q 2 j c 2 + q 2 j d 2 = p 2 j (c 2 + d 2 ). Since j l, we see that 2 j = j + j j + l = k, and since q k is the highest power of q that divides n and q 2 j n, we must conclude that k = 2 j, which is an even number. 122

124 Now that we know for which positive integers n does the Diophantine equation n = x 2 + y 2 have a non-trivial solution, there are only two questions left for us to discuss namely how many solutions are there and how does one compute the solutions. Let r 2 (n) denote the number of integer solutions to n = x 2 + y 2, where x,y Z are allowed to be positive, negative or zero. As it turns out, r 2 (n) = 4(d 1 (n) d 3 (n)), where d 1 (n) and d 3 (n) correspond to the number of divisors of n congruent to 1 and 3 modulo 4, respectively. This formula can also be rewritten as follows: r 2 (n) = 4 d n d 1,3 (mod 4) ( 1) d 1 2. From this formula it follows that for every prime p 1 (mod 4) the Diophantine equation p = x 2 + y 2 has only 4 solutions, and if (x,y) is one of them, then the other three are (x, y),( x,y) and ( x, y). As for the computation of the actual solutions, when p 1 (mod 4) is prime, the computation of x and y such that p = x 2 + y 2 basically reduces to finding a quadratic residue of 1 modulo p. This can be done in polynomial time using the Tonelli-Shanks Algorithm. If z is an integer such that z 2 1 (mod p), and then one can use the Euclidean algorithm for Gaussian integers to compute x + yi = gcd(z + i, p). In order to find a solution to n = x 2 + y 2 for a composite integer n, one would have to factor n first, and as we now the integer factorization is a difficult task. In fact, as we saw in Assignment 3, the ability to represent a composite integer n as a sum of two squares in two different ways yields a non-trivial factorization of n. Such a method of factorization is called the Euler Factorization Method. Leonhard Euler used this method to factor the integer = by knowing the fact that = = Exercise Consider the setup as in Exercise We say that γ 0 is an Eisenstein prime if the factorization γ = αβ for α,β Z[ω] implies that either α is a unit or β is a unit. 1. Prove that every rational prime p 2 (mod 3) is also an Eisenstein prime. Hint: See Example

125 2. Note that 3 = (1 ω)(1 ω 2 ), so 3 is not an Eisenstein prime. Also, it can be shown that every rational prime p 1 (mod 3) is not an Eisenstein prime. Use this fact, as well as Parts (a) and (b), to show that every integer n with the prime factorization n = 3 t p e 1 1 pe 2 2 pe k k q2 f 1 1 q2 f 2 2 q 2 f l l, where p i 1 (mod 3) for all i = 1,2,...,k and q j 2 (mod 3) for all j = 1,2,...,l, admits a non-trivial solution (x,y) to the Diophantine equation n = x 2 xy + y Continued Fractions So far, we managed to Definition Let a 0,a 1,...,a N be non-zero real numbers. Define the partial fraction [a 0,a 1,...,a N ] by [a 0,a 1,...,a N ] := a a an The numbers a 0,a 1,...,a N are called partial coefficients of [a 0,a 1,...,a N ]. If n is an integer such that 0 n N, the partial fraction [a 0,a 1,...,a n ] is called the n-th covergent to [a 0,a 1,...,a N ]. Example Let us determine the value of [ 2, 2, 2]. We have [ 2, 2, 2] = = = Also, we see that = = [ 1,1, 7 ] Thus several continued fractions can correspond to the same number. Some continued fractions, like = [1,1,7,1,2,1,7,1,2,1,...], 124

126 appear to be periodic, while some continued fractions, like 3 3 = [1,2,3,1,4,1,5,1,1,6,2,5,8,...] seem to be aperiodic. They can also be infinite. Certain continued fractions have quite elegant continued fraction expansions. For example, tan(1) = [1,1,1,3,1,5,1,7,1,9,1,11,1,13,...]. Note that, according to our definition, there are no limitations on a 0,a 1,...,a N, aside from the fact that they all have to be non-zero real numbers. These numbers can be positive or negative, rational or irrational, algebraic or transcendental. Exercise Compute [1, 2,3, 4,5] and [ 5, 2 5,3 5]. Give an example of a continued fraction of 3 2 with at least five terms. Some elementary properties of continued fractions are ] [a 0,a 1,...,a n ] = [a 0,a 1,...,a n 1 + 1an, and, more generally, [a 0,a 1...,a n ] = [a 0,[a 1,...,a n ]] [a 0,a 1,...,a n ] = [a 0,a 1,...,a m 1,[a m,...,a n ]]. Proposition Let a 0,a 1,...,a N be non-zero real numbers. For a non-negative integer n, define the integers p n and q n by Then [a 0,a 1,...,a n ] = p n /q n. p 0 = a 0, q 0 = 1, p 1 = a 1 a 0 + 1, q 1 = a p n = a n p n 1 + p n 2, q n = a n q n 1 + q n 2. Proof. We will prove this statement using the principle of mathematical induction. Base case. Clearly, we have [a 0 ] = a 0 = a 0 1 = p 0 q 0, [a 0,a 1 ] = a 0a 1 +1 a 1 = p 1 q 1, 125

127 so the result holds for n = 0,1. Induction hypothesis. Suppose that the statement is true for n = m 1,m, where m < N. Induction step. We will show that the result holds for n = m + 1. We have [ [a 0,a 1,...,a m+1 ] = a 0,a 1,...,a m + 1 ] a ( ) m+1 a m + a 1 m+1 p m 1 + p m 2 = ( a m + a 1 m 1 )q m 1 + q m 2 = a m+1(a m p m 1 + p m 2 ) + p m 1 a m+1 (a m q m 1 + q m 2 ) + q m 1 = a m+1p m + p m 1 a m+1 q m + q m 1 = p m+1 q m+1. Proposition For any positive integer n, it is the case that p n q n 1 p n 1 q n = ( 1) n 1 or, equivalently, Proof. See Assignment 6. p n q n p n 1 q n 1 = ( 1)n q n q n 1. Proposition For any positive integer n, it is the case that p n q n 2 p n 2 q n = ( 1) n a n or, equivalently, p n q n p n 2 q n 2 = ( 1)n a n q n q n

128 Proof. The result follows from Proposition 31.5: p n q n p n 2 q n 2 = a np n 1 + p n 2 a n q n 1 + q n 2 p n 2 q n 2 = q n 2(a n p n 1 + p n 2 ) p n 2 (a n q n 1 + q n 2 ) q n 2 (a n q n 1 + q n 2 ) = a n(p n 1 q n 2 p n 2 q n 1 ) q n q n 2 = ( 1)n a n q n q n 2. Proposition Let N be a positive integer. Suppose that a 0 Z and a 1,a 2,...,a N N. Also, let x n = p n /q n. Then the following hold: 1. It is the case that and x 0 < x 2 < x 4 <... x 1 > x 3 > x 5 > Furthermore, every odd convergent is greater than any even convergent. That is, x 2k+1 > x 2l for any k and l; 3. The N-th convergent x N is greater than any even convergent and less than any odd convergent. Proof. Let us prove property 1. If n is even, then it follows from Proposition 31.6 that x n x n 2 = p n p n 2 = a n > 0. q n q n 2 q n q n 2 Therefore x n 2 = p n 2 q n 2 < p n q n = x n for all even n. Analogously, one can show that x n 2 > x n for all odd n. 127

129 To establish property 2, recall that by Proposition 31.5 we have x 2k+1 > x 2k for all non-zero k. If l k, then x 2k > x 2l, so x 2k+1 > x 2l. If l > k, then x 2l < x 2l+1 ; since x 2k+1 > x 2l+1, it follows that x 2k+1 > x 2l. Finally, to see that property 3 holds, we note that if x N is even then by property 1 we have x 0 < x 2 <... < x N. Thus x N is greater than any even convergent. On the other hand, by property 2, every even convergent, including x N, is less than every odd convergent. The result then follows for all even N, and similarly one can also argue that it is true when N is odd. Example Let us see an example of the phenomenon described in Proposition Consider the following continued fraction expansion of 7: 7 = [2,1,1,1,4,1,1,1,4,1,...] = The first 10 convergents of 7 are We see that 2,3, 5 2, 8 3, 37 14, 45 17, 82 31, , , < 5 2 < < < <... < 7 <... < < < < 8 3 < 3. All the n-convergents to the left of 7 correspond to even n and all the n-th convergents to the right of 7 correspond to odd n. Now let α be a real number. We construct a canonical continued fraction expansion of α as follows: Step 1. Define a 0 := α. If α = a 0 then α = [a 0 ]. Otherwise let α = a 0 + 1/α 1 for some α 1. Step 2. Let a 1 = α 1. If α 1 = a 1 then α = a 0 + 1/a 1 = [a 0,a 1 ]. Otherwise let α = a 0 + 1/(1 + 1/α 2 ) for some α 2. We repeat this procedure. If it stops after a finite number of steps then α = [a 0,...,a N ]. Otherwise α = [a 0,a 1,...] has an infinite continued fraction expansion. Example Let us determine the first five terms in the canonical continued fraction expansion of π = , as well as the first five convergents of π. 128

130 Step 1. Define a 0 := π = 3. Then where α 1 = 1/(π 3) = Step 2. Define a 1 := α 1 = 7. Then π = [3,α 1 ] = α 1, π = [3,7,α 2 ] = α 2, where α 2 = 1/(α 1 7) = We see that the first convergent to π is Step 3. Define a 2 := α 2 = 15. Then p 1 q 1 = a a 1 = = π = [3,7,15,α 3 ] = , 15+ α 1 3 where α 3 = 1/(α 2 15) = We see that the second convergent to π is p 2 q 2 = a a a 2 = Proceeding in the same fashion, we see that and the first five convergents of π are π = [3,7,15,1,292,1,...], 22 7, , , , = Exercise Determine the first five terms in the canonical continued fraction expansion of the Euler constant e = , as well as the first five convergents of e. Exercise Prove that α has a finite canonical continued fraction expansion if and only if α is a rational number. 129

131 Proposition Let α be a real number and let p n /q n be the n-th convergent in the canonical fraction expansion of α. Then q 1 α p 1 > q 2 α p 2 > q 3 α p 3 >... Proof. Let α = [a 0,a 1,...,a n,α n+1 ]. Then α = α n+1p n + p n 1 α n+1 q n + q n 1. It follows from Proposition 31.5 that ( ) q n α p n = q pn α n+1 + p n 1 n p n q n α n+1 + q n 1 = q np n 1 p n q n 1 q n α n+1 + q n 1 1 =. q n α n+1 + q n 1 Now note that q n α n+1 + q n 1 q n + q n 1 = a n q n 1 + q n 2 + q n 1 = (a n + 1)q n 1 + q n 2 > α n q n 1 + q n 2. The observation made above allows us to conclude that q n α p n = 1 1 < = q n 1 α p n 1. q n α n+1 + q n 1 q n 1 α n + q n 2 Proposition Let α be a real number and let p n /q n be the n-th convergent in the canonical fraction expansion of α. Then 1 (a n+1 + 2)q 2 < α p n n q n < 1 a n+1 q 2. n 130

132 Proof. Let α = [a 0,a 1,...,α n+1 ] for some α n+1 such that a n+1 α n+1 < a n Also, let p n /q n = [a 0,a 1,...,a n ] be the n-th convergent to α. Then it follows from the formula α = α n+1p n + p n 1 α n+1 q n + q n 1, as well as from Proposition 31.5, that α p n = 1 q n (α n+1 q n + q n 1 ). q n Since q n > q n 1, we can deduce the desired result by establishing the following inequalities: a n+1 q n < α n+1 q n + q n 1 < (a n+1 + 1)q n + q n = (a n+1 + 2)q n. Proposition Let α be a real number and let p n /q n be the n-th convergent in the canonical fraction expansion of α. Then for all integers p and q such that 0 < q < q n+1 it is the case that qα p q n α p n. Proof. Note that if p = p n and q = q n then the result holds. Thus we may assume that p/q p n /q n. Recall from Proposition 31.5 that Then the matrix p n q n+1 q n p n+1 = ( 1) n+1. ( ) pn p A = n+1 q n q n+1 has a non-zero determinant deta = ( 1) n+1, which means that it is invertible. Furthermore, the inverse matrix is defined by A 1 = 1 ( ) ( ) qn+1 p n+1 = ( 1) n+1 qn+1 p n+1. deta q n p n q n p n As we can see, the matrix A 1 has integer coefficients. This means that the matrix equation ( ) ( ) p u = A q v 131

133 can be solved in integers u and v, and the solution is u = ( 1) n+1 (q n+1 p p n+1 q), v = ( 1) n+1 (p n q q n p). Note that v 0 and u 0, for otherwise it would be the case that p/q = p n /q n or p/q = p n+1 /q n+1. Of course, the latter is impossible because, according to the hypothesis, q < q n+1. Now consider the expressions p = up n + vp n+1, q = uq n + vq n+1. Note that q = uq n + vq n+1 < q n+1. We claim that u and v have opposite signs. If we assume that both u and v are negative then q would be negative, which contradicts the assumption q > 0. On the other hand, if we assume that both u and v are positive, then q would have to exceed q n+1. This would lead us to a contradiction to the inequality established above. Since neither u nor v can be zero, we see that our claim holds; that is, u and v have different parity. Next, recall that according to property 3 of Proposition 31.7 either p n q n < α < p n+1 q n+1 or p n+1 q n+1 < α < p n q n must hold, depending on whether n is even or odd. In any case, it must be that αq n p n and αq n+1 p n+1 have different parity. Since u,v have different parity and αq n p n,αq n+1 p n+1 have different parity, the signs of u(q n α p n ) and v(q n+1 α p n+1 ) match. Hence qα p = α(uq n + vq n+1 ) (up n + vp n+1 ) = u(q n α p n ) + v(q n+1 α p n+1 ) = u(q n α p n ) + v(q n+1 α p n+1 ) u q n α p n q n α p n. The fact that u(q n α p n ) and v(q n+1 α p n+1 ) have the same signs was utilized to establish the third equality. In turn, the last inequality follows from the fact that u 1, which is a consequence of u

134 Corollary Let p/q be a rational number and let α be a real number. Then the inequality α p q < 1 2q 2 implies that p/q = p n /q n for some non-negative integer n. That is, the number p/q appears as a convergent in the canonical continued fraction expansion of α. Proof. See Assignment 6. We conclude this section by discussion the question of periodicity of canonical continued fraction expansions. Definition Let α be a real number with the canonical continued fraction expansion α = [a 0,a 1,...,a n ;b 1,b 2,...,b k,b 1,b 2,...,b k,b 1,...]. In other words, at some point the elements of the continued fraction expansion start to repeat. We indicate this by writing α = [a 0,a 1,...,a n ;b 1,b 2,...,b k ]. A canonical continued fraction expansion of such kind is called preperiodic, and if the terms a 0,a 1,a 2,...,a n are missing we say that it is periodic. The smallest number k such that the terms repeat is called the period of a continued fraction. It was proved by Joseph-Louis Lagrange that a real number α has a preperiodic canonical continued fraction expansion if and only if it is a quadratic irrational. That is, α = a + b d for some rational numbers a,b 0 and d, where d is a positive integer that is not a perfect square. Example Let us determine the canonical continued fraction expansion of 7. By computing the first few terms, we see that 7 = [2,1,1,1,4,1,1,1,4,1,...]. Thus we can guess that 7 = [2,1,1,1,4]. Let us prove this fact. Let θ = [1,1,1,4]. Then θ = [1,1,1,4] = [1,1,1,4,θ] = θ = 14θ + 3 9θ

135 We see that θ satisfies the equation 3θ 2 4θ 1 = 0. The above equation has two roots, but since θ > 0 we can conclude that Then as claimed. θ = [2,1,1,1,4] = [2,θ] = θ = = = 7, Exercise Determine canonical continued fraction expansions for and 2. Are they both preperiodic? Are they both periodic? What are the periods of their continued fraction expansions? Exercise Prove that if a real number α has a preperiodic canonical continued fraction expansion, then there exist rational integers a,b and c, not all zero, such that aα 2 + bα + c = The Pell s Equation For more details on the subject, we refer the reader to the monograph of M. J. Jacobson, Jr. and H. C. Williams, Solving the Pell Equation, In 1773, Gotthold Ephraim Lessing was appointed librarian of the Herzog August Library in Wolfenbüttel, Germany. In this library, he discovered an ancient Greek manuscript with a poem of 44 lines, which contained an interesting arithmetical problem. This problem is attributed to Archimedes and is called the Archimedes Cattle Problem. The problem was to calculate the number of cattle in the herd of Helios, the god of the sun. There were two parts to this problem, the first of which could be solved relatively easy by setting up a system of seven equations with eight unknowns, each for one type of bulls and cows present in the herd. Much more challenging was the second part of the problem, which, in its essence, asked the reader to calculate a solution to the equation x y 2 =

136 Despite its innocent look, the smallest solution to this equation has more than digits. In 1880, A. Amthor discovered that the smallest herd that could satisfy both parts of this problem had approximately bulls. In comparison, it is conjectured that there are between and atoms in the known, observable universe. 45 Of course, Amthor himself did not calculate this number precisely. In 1965, the precise answer to the Archimedes Cattle Problem was given by Hugh Williams, Gus German and Robert Zanke, who were University of Waterloo students at that time. To calculate the answer, they used a combination of the IBM 7040 and IBM 1620 computers. You can find a fascinating article about the history of computing at the University of Waterloo here: An equation of the form x 2 dy 2 = ±1, where d is positive and is not a perfect square, is called a Pell s equation. The name is due to Euler, who attributed the method of solving this equation to John Pell. It is widely believed that Euler actually made a mistake and confused John Pell with William Brouncker. The English mathematician William Brouncker discovered a general method for solving the Pell s equation, which was based on continued fractions. He was able to apply it to the equation and find the smallest positive solution x 2 313y 2 = 1 x = , y = When writing to Frenicle de Bessy who proposed this problem to him, Brouncker claimed that it only took him an hour or two to find the solution. In 1768, Joseph-Louis Lagrange managed to prove that Pell s equation has a solution different from (±1,0) for every positive d that is not a perfect square. We will now apply Corollary to show that every positive solution to Pell s equation x 2 dy 2 = ±1 must arise as a convergent of d. Theorem Let d be a positive integer that is not a perfect square. Then every solution (x,y) (±1,0) to Pell s equation x 2 dy 2 = ±1 45 According to 135

137 must satisfy x/y = p n /q n for some positive integer n, where p n /q n is the n-th convergent of d. Proof. Suppose that (x,y) (±1,0) is a solution. Without loss of generality, we may assume that x and y are positive. Then x dy 2 1 y d 1. Therefore x 1 dy = x + dy 1 ( ) < 1 dy d 1 2y, since d + d 1 > 2 for d 2. Thus x d y < 1 2y 2. It follows from Corollary that x/y is a convergent of d. 33 Algebraic and Transcendental Numbers. Liouville s Approximation Theorem In 1840, the French mathematician Joseph Marie Liouville proved the so-called Approximation Theorem, which allowed him to discover the first transcendental number k=0 10 k!. This number is called the Liouville Number. You are asked to reproduce Liouville s proof for a different number in Exercise Definition A complex number α is called algebraic if there exists a nonzero polynomial f (t) with rational coefficients such that f (α) = 0. Otherwise, it is called transcendental. Definition Let α be an algebraic number. Let be a polynomial such that a) f (α) = 0; f (t) = c d t d + c d 1 t d c 1 t + c 0 136

138 b) c 0,c 1,...,c d Z; c) c d > 0; d) gcd(c 0,c 1,...,c d ) = 1; e) The polynomial f (t) has the smallest degree among all non-zero polynomials satisfying a), b), c) and d). Then f (t) is called the minimal polynomial of α. It is a fact from algebraic number theory that such a polynomial is unique. We say that the algebraic number α has a degree d if the degree of its minimal polynomial is equal to d, i.e. deg f = d. Example Consider the number 2. This number is algebraic, since 2 is a root of the polynomial f (t) = t 2 2, which has rational coefficients. Note that it is also a root of f 1 (t) = 0, or f 2 (t) = t 3 +3t 2 2t 6, or f 3 (t) = 6t However, none of these polynomials satisfy Definition Exercise Explain why the numbers α = 0,1/2,i, are algebraic. For each α, find a non-zero monic polynomial with rational coefficients such that f (α) = 0. Exercise a) Prove that every rational number x/y has degree 1; b) Prove that every quadratic irrational has degree 2. In other words, show that every number of the form a + b d, where a,b,d Q and d ±r 2 for some r Q, satisfies some polynomial f (x) of degree 2 and does not satisfy any polynomial of degree 1. Some properties of an irreducible polynomial: For a given algebraic number α the minimal polynomial of α is unique; Every minimal polynomial f (t) is irreducible over the field of rational numbers. That is, if g(t) f (t) and g(t) Q[t], then g(t) = ± f (t) or g(t) = ±1; Let α be a root of its minimal polynomial f (t). Then f (α) 0. That is, in C[t] it is the case that (t α) f (t) while (t α) 2 f (t). 137

139 Theorem (Liouville s Approximation Theorem, 1840) Let α be an irrational algebraic number (that is, a number of degree d 2). Then there exists some constant C, which depends only on α, such that for any x Z,y N the following inequality holds: α x y C y d. Proof. 46 Let f (t) = c d t d +...+c 1 t +c 0 be the minimal polynomial of α. Since f is irreducible over Q and is of degree d 2, it has no rational roots, so f (x/y) 0 for any x Z,y N. Furthermore, ( x d ( ) y) f x k = k k=0c y = 1 d y d c k x k y d k 1 k=0 y d. }{{} N We now apply the Mean Value Theorem and observe that there exists some real ξ, satisfying f f (α) f (x/y) (ξ ) = = f (x/y) α x/y α x/y. Rearranging the terms of the above equality, we get α x ( ) x y = f f (ξ ) 1 f (ξ ) 1 y y d. For now, our constant f (ξ ) 1 depends on α and y (note that x depends on α and y), but it is not hard to eliminate the dependency on y by slightly adjusting our constant. In particular, since f is minimal, the multiplicity of α is 1, which means that f (α) 0. This means that for all ξ within some small neighbourhood U α of α, it must be the case that 0 < f (ξ ) 2 f (α). Plainly, there exists some large y 0, which depends only on α, such that some rational fraction x/y with the denominator y y 0 falls into U α. We conclude that α x y f (ξ ) 1 y n f (α) 1 2y d for all y y 0. Finally, we choose our constant c by picking the minimum between 2 1 f (α) 1 and y d α x/y over all y < y 0. This concludes the proof. 46 The proof is from P. Garrett, Liouville s theorem on diophantine approximation, See approx.pdf. There is an error in these notes: instead of estimating f (ξ ) from above, the author obtains the estimate from below). 138

140 Liouville s Approximation Theorem is a very elegant result which can be explained on a rather intuitive level. As y grows, we certainly expect our approximations x/y of α to be more precise. The question is, to what extent, and how can we measure the quality of our approximation? The theorem tells us that any irrational algebraic number cannot be approximated too well by rational numbers. One intuitive explanation of this phenomenon is the following: no fraction will approximate α better than up to 1 d + log y C base-y places. For example, when α = 2 one may take C = 1/4, and observe that for all q 2 x 2 y 1 4y 2. One of the ways to interpret the above inequality is as follows: no fraction x/y for y > 2 will approximate 2 significantly better than up to 2 base-y places. Many more things can be said regarding Liouville s inequality. For example, one may ask what happens if we make C a function of y: α x y C(y) y d. It turns out that for d = 2 one cannot replace the constant C with some monotonously increasing function C(y), but for d 3 this can be done. The first improvement of such kind was introduced by Thue in 1909, who showed that one can take C(y) = c 1 y d 2 1 ε for some constant c 1, which depends only on α, and any ε > 0. This result allowed him to prove Thue s Theorem. The further improvements were developed by Siegel, Gelfond and Dyson, until in 1955 Roth showed that C(y) = c 1 y d 2 ε would do the job as well. In basic terms, his result states that there are only finitely many rational approximations x/y to α of degree 3, which will result in more than 2 + ε accurate base-y places. Exercise (a) Prove that, for every integer n 1, the number α := k=0 1 2 k! = satisfies the inequality n α 1 k=0 2 k! < 1 (2 n! ) n. (9) 139

141 Hint: Note that k=n k! < k=(n+1)! 1 2 k. Use the formula for the infinite geometric series afterwards. (b) Use Liouville s Theorem and the inequality established in Part (a) to prove that the number α is either rational or transcendental. Hint: Suppose not. Then there exist fixed integers d 2 and C > 0 such that α x y C y d for all integers x and y > 0. Why does this inequality contradict the inequality (9)? 34 Elliptic Curves Let n be a squarefree number. We say that n is congruent if there exists a right triangle with rational sides whose area is n. For example, the number 5 is congruent since it is the area of the right triangle with rational sides 20/3, 3/2 and 41/6. Number 6 is also congruent, since it is the area of the right triangle with rational sides 3, 4 and 5. In contrast, the number 3 is not congruent. Also, note that if n is congruent, then any integer of the form s 2 n also trivially arises as the area of a right triangle with rational sides. That is why we restrict our attention only to squarefree n. Given a squarefree number n, how can we find out whether it is congruent or not? Essentially, what we need to do is to solve the system of equations { a 2 + b 2 = c 2 ; 1 2 ab = n for a,b,c Q. Set Then x = n(a + c), y = 2n2 (a + c) b b 2. y 2 = x 3 n 2 x, 140

142 where y 0. Thus, instead of the original system of equations we just have to find x,y Q such that y 2 = x 3 n 2 x. If such rational x and y exist, one can easily obtain a solution to the original system of equations by setting a = x2 n 2, b = 2nx y y, c = x2 + n 2. y Thus we just have to find a rational point (x,y) on the curve y 2 = x 3 n 2 x. Such a curve is an example of elliptic curve. Definition Let F = F q,q,r,c, where q is a prime power. 47 Let a,b F be such that 4a b 2 0. The collection E(F) = { (x,y) F 2 : y 2 = x 2 + ax + b } { } is called an elliptic curve, defined over the field F. Here denotes the point at infinity. The value = 16(4a b 2 ) is called the discriminant of an elliptic curve E(F). Example The graph of an elliptic curve E 1 : y 2 = x 3 25x over R is depicted on Figure 5. This elliptic curve, aside from trivial rational points (0, 0) and (±5,0), contains a rational point (x,y) = (45,300). This fact implies that the number 5 is congruent. Furthermore, one can show that in the case of E 1 (Q) the existence of one non-trivial rational point implies the existence of infinitely many rational points. In contrast, E 2 : y 2 = x 3 9x has no non-trivial rational points, so the elliptic curve E 2 (Q) contains only four points, namely (0,0), (±3,0) and the point at infinity. Both curves E 1 (R) and E 2 (R) contain infinitely many points. Also, note that the graph of E 1 (R) contains two connected components. This is because the discriminant of E 2 is equal to (E 1 ) = 10 6 and is positive. In contrast, the discriminant of E 3 : y 2 = x 3 2 is equal to (E 3 ) = 1728 and is negative. The negative sign indicates that the graph of E 2 (R) has one connected component. Exercise Find integers a and b such that the discriminant of a curve y 2 = x 3 + ax + b is equal to zero. How does the graph of such a curve look like? 47 Here F q denotes the finite field of order q. We will not give a rigorous construction of F q here. We remark though that when q is prime the finite field F q is the same as Z q, the ring of residue classes modulo q. 141

143 Figure 5: Elliptic curves y 2 = x 3 25x and y 2 = x 3 2 Many problems in number theory are actually connected to elliptic curves. For example, consider the Fermat equation a 3 + b 3 = c 3. The question of existence of non-trivial solutions to this Diophantine equation is equivalent to solving the equation u 3 + v 3 = 1 in rational numbers u and v. If we now let then x = 12(u 2 uv + v 2 ), y = 36(u v)(u 2 uv + v 2 ), y 2 = x If some point (x,y) Q 2 lies on the elliptic curve determined by the above equation, then it is straightforward to check that the numbers u = 36 + y 6x, v = 36 y 6x are rational and satisfy u 3 + v 3 = 1. So once again the existence of a solution to some Diophantine equation reduces to the question of existence of a non-trivial rational point on some elliptic curve. The first questions about elliptic curves date back to Diophantus of Alexandria, who looked at the Diophantine equation of the form y(6 y) = x 3 x. 142

144 Fermat claimed that he knew how to solve the Diophantine equation y 2 = x 3 + 1, but did not provide his proof. The problem got fully resolved only one century later by Euler. The field of algebraic number theory essentially was born when Euler tried to solve the Diophantine equation y 2 = x 3 2 by writing x 3 = y = (y + 2)(y 2) and then claiming that y + 2 and y 2 are coprime, without rigorously explaining what coprimeness means in this setting. Of course, his intuition was correct: the ring Z[ 2] is a Unique Factorization Domain, and indeed one can show that y + 2 and y 2 are coprime in Z[ 2], as long as y 0. Elliptic curves got extensively studied over the past two centuries. The theory of elliptic curves truly blossomed with the prominent work of Weierstrass on elliptic functions, which connects elliptic curves defined over the field of complex numbers C to lattices on a complex plane. In fact, every elliptic curve arises from (or can be reduced to) a lattice on the complex plane! Elliptic curves are intimately connected to modular forms, and the development of the theories of elliptic curves and modular forms resulted in Andrew Wiles s proof of Fermat s Last Theorem (see Section 28 for more details). Other prominent mathematicians which contributed a lot to the development of the theory of elliptic curves were Abel and Jacobi. By studying so-called elliptic integrals, they realized that, in fact, one can impose arithmetic on points of an elliptic curve. More precisely, such an arithmetic takes place whenever an elliptic curve E(F) is defined over a field F. This is why we restrict our attention only to F = F q,q,r,c and not, say, Z or Z/p k Z for p prime and k 2. The latter two collections are rings but not fields. To explain what this means, consider for now some elliptic curve E defined over the field of real numbers R. For two distinct points P,Q E(R), we draw a line through P and Q. Of course, this line is uniquely defined. For now, let us assume that this line is neither tangent to P nor to Q (see the first picture on Figure 6). 48 Our line will intersect E at some third point, say R. Our arithmetic on an elliptic curve is then defined as follows: P + Q + R = ; that is, any three points P, Q and R which lie on E add up to (the point at 48 The picture is taken from Wikipedia: commons/thumb/7/77/ecclines-2.0.svg/680px-ecclines-2.0.svg.png. 143

Figure 6: Group law We can formalize the observations made above as follows. Let P = (x P,y P ) and Q = (x Q,y Q ). Suppose that x P x Q.

145 infinity). Alternatively, if R = (x R,y R ), we can write P + Q = R, so by adding two points together we were able to produce the third point, namely R = (x R, y R ). On Figure 6, the point at infinity is actually denoted by 0. Soon we will see that there is a deep reason for this alternative notation. Figure 6: Group law We can formalize the observations made above as follows. Let P = (x P,y P ) and Q = (x Q,y Q ). Suppose that x P x Q. Let s = y P y Q x P x Q denote the slope of the line passing through the points P and Q. Then we define the third point R = (x R,y R ) = P + Q as follows: x R = s 2 x P x Q, y R = y P + s(x P x R ). It is straightforward to verify that R indeed belongs to E(R). Furthermore, if we look closer at the expressions for x R and y R, we can notice that they preserve the field of definition. That is, if P and Q are points in R 2, then R is also a point in R 2. If P and Q are points in Q 2, then so is R. This applies to any field, so the procedure of addition of points is well-defined for any base field F. See Figure 7 for the demonstration that the field of definition remains unchanged. All the points in this example belong to Z 2, and therefore in Q 2 as weill. Note, however, that in general an addition of two integer points on an elliptic curve may not result in an integer point, but it will result in a rational point or a point at infinity The picture is taken from William Stein s lecture notes, Chapter 6, Figure 6.3: wstein.org/simuw06/ch6.pdf. 144

Introduction to Number Theory

Introduction to Number Theory INTRODUCTION Definition: Natural Numbers, Integers Natural numbers: N={0,1,, }. Integers: Z={0,±1,±, }. Definition: Divisor If a Z can be writeen as a=bc where b, c Z, then we say a is divisible by b or,