HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar information modulo a higher ower of can be interreted as imroving an aroximation. Examle 1.1. The number 7 is a square mod 3: 7 1 2 mod 3. Although 7 1 2 mod 9, we can write 7 as a square mod 9 by relacing 1 with 1 + 3: 7 (1 + 3) 2 mod 9. Here are exressions of 7 as a square modulo further owers of 3: 7 (1 + 3 + 3 2 ) 2 mod 3 3, 7 (1 + 3 + 3 2 ) 2 mod 3 4, 7 (1 + 3 + 3 2 + 2 3 4 ) 2 mod 3 5,. 7 (1 + 3 + 3 2 + 2 3 4 + 2 3 7 + 3 8 + 3 9 + 2 3 10 ) 2 mod 3 11. If we can kee going indefinitely then 7 is a erfect square in Z 3 with square root 1 + 3 + 3 2 + 2 3 4 + 2 3 7 + 3 8 + 3 9 + 2 3 10 +. That we really can kee going indefinitely is justified by Hensel s lemma, which will rovide conditions under which the root of a olynomial mod can be lifted to a root in Z, such as the olynomial X 2 7 with = 3: its two roots mod 3 can both be lifted to square roots of 7 in Z 3. We will first give a basic version of Hensel s lemma, illustrate it with examles, and then give a stronger version that can be alied in cases where the basic version is inadequate. 2. A Basic Version of Hensel s lemma Theorem 2.1 (Hensel s lemma). If f(x) Z [X] and a Z satisfies 0 mod, f (a) 0 mod then there is a unique α Z such that f(α) = 0 and α a mod. Examle 2.2. Let f(x) = X 2 7. Then f(1) = 6 0 mod 3 and f (1) = 2 0 mod 3, so Hensel s lemma tells us there is a unique 3-adic integer α such that α 2 = 7 and α 1 mod 3. We saw aroximations to α in Examle 1.1, e.g., α 1 + 3 + 3 2 + 2 3 4 mod 3 5. Proof. We will rove by induction that for each n 1 there is an a n Z such that f(a n ) 0 mod n, a n a mod. The case n = 1 is trivial, using a 1 = a. If the inductive hyothesis holds for n, we seek a n+1 Z such that f(a n+1 ) 0 mod n+1, a n+1 a mod. 1
2 KEITH CONRAD Since f(a n+1 ) 0 mod n+1 f(a n+1 ) 0 mod n, each root of f(x) mod n+1 reduces to a root of f(x) mod n. By the inductive hyothesis there is a root a n mod n, so we seek a -adic integer a n+1 such that a n+1 a n mod n and f(a n+1 ) 0 mod n+1. Writing a n+1 = a n + n t n for some t n Z to be determined, can we make f(a n + n t n ) 0 mod n+1? To comute f(a n + n t n ) mod n+1, we use a olynomial identity: (2.1) f(x + Y ) = f(x) + f (X)Y + g(x, Y )Y 2 for some olynomial g(x, Y ) Z [X, Y ]. This formula comes from isolating the first two terms in the binomial theorem: writing f(x) = d i=0 c ix i we have f(x + Y ) = c i (X + Y ) i = c 0 + c i (X i + ix i 1 Y + g i (X, Y )Y 2 ), i=0 where g i (X, Y ) Z[X, Y ]. Thus f(x + Y ) = c i X i + ic i X i 1 Y + i=0 i=1 i=1 g i (X, Y )Y 2 = f(x) + f (X)Y + g(x, Y )Y 2, i=1 where g(x, Y ) = d i=1 c ig i (X, Y ) Z [X, Y ]. This gives us the desired identity. 1 To make (2.1) numerical, for all x and y in Z the number z := g(x, y) is in Z, so (2.2) x, y Z = f(x + y) = f(x) + f (x)y + zy 2, where z Z. In this formula set x = a n and y = n t n : (2.3) f(a n + n t n ) = f(a n ) + f (a n ) n t n + z 2n t 2 n f(a n ) + f (a n ) n t n mod n+1 since 2n n + 1. In f (a n ) n t n mod n+1, the factors f (a n ) and t n only matter mod since there is already a factor of n and the modulus is n+1. Recalling that a n a mod, we get f (a n ) n t n f (a) n t n mod n+1. Therefore from (2.3), f(a n + n t n ) 0 mod n+1 f(a n ) + f (a) n t n 0 mod n+1 f (a)t n f(a n )/ n mod, where the ratio f(a n )/ n is in Z since we assumed that f(a n ) 0 mod n. There is a solution for t n in the congruence mod since we assumed that f (a) 0 mod. Armed with this choice of t n and setting a n+1 = a n + n t n, we have f(a n+1 ) 0 mod n+1 and a n+1 a n mod n, so in articular a n+1 a mod. This comletes the induction. Starting with a 1 = a, our inductive argument has constructed a sequence a 1, a 2, a 3,... in Z such that f(a n ) 0 mod n and a n+1 a n mod n for all n. The second condition, a n+1 a n mod n, imlies that {a n } is a Cauchy sequence in Z. Let α be its limit in Z. We want to show f(α) = 0 and α a mod. From a n+1 a n mod n for all n we get a m a n mod n for all m > n, so α a n mod n by letting m. At n = 1 we get α a mod. For general n, α a n mod n = f(α) f(a n ) 0 mod n = f(α) 1 n. Since this estimate holds for all n, f(α) = 0. 1 The identity (2.1) is similar to Taylor s formula: f(x + h) = f(x) + f (x)h + (f (x)/2!)h 2 +. The catch is that terms in Taylor s formula have factorials in the denominator, which can require some extra care when reducing modulo owers of : think about f (x)/2! mod 2, for instance. What (2.1) essentially does is extract the first two terms of Taylor s formula and say that what remains has -adic integral coefficients, so (2.1) can be reduced mod, or mod n for all n 1.
HENSEL S LEMMA 3 It remains to show α is the unique root of f(x) in Z that is congruent to a mod. Suose f(β) = 0 and β a mod. To show β = α we will show β α mod n for all n. The case n = 1 is clear since α and β are both congruent to a mod. Suose n 1 and we know that β α mod n. Then β = α + n γ n with γ n Z, so a calculation similar to (2.3) imlies f(β) = f(α + n γ n ) f(α) + f (α) n γ n mod n+1. Both α and β are roots of f(x), so 0 f (α) n γ n mod n+1. Thus f (α)γ n 0 mod. Since f (α) f (a) 0 mod, we have γ n 0 mod, which imlies β α mod n+1. Remark 2.3. An argument similar to the last aragrah shows for all n 1 that f(x) has a unique root mod n that reduces to a mod. So in Theorem 2.1 we can think about the uniqueness of the lifting of the mod root in two ways: it has a unique lifting to a root in Z or it has a unique lifting to a root in Z/( n ) for all n 1. Here are five alications of Hensel s lemma. Examle 2.4. Let f(x) = X 3 2. We have f(3) 0 mod 5 and f (3) 0 mod 5, Therefore Hensel s lemma with initial aroximation a = 3 tells us there is a unique cube root of 2 in Z 5 that is congruent to 3 mod 5. Exlicitly, it is 3 + 2 5 2 + 2 5 3 + 3 5 4 +. Examle 2.5. Let f(x) = X 3 X 2. We have f(0) 0 mod 2 and f(1) 0 mod 2, while f (0) 1 mod 2 and f (1) 0 mod 2. Therefore Hensel s lemma with initial aroximation a = 0 imlies there is a unique α Z 2 such that f(α) = 0 and α 0 mod 2. Exlicitly, α = 2 + 2 2 + 2 4 + 2 7 +. Although 1 is a root of f(x) mod 2, it does not lift to a root in Z 2 since it doesn t even lift to a root mod 4: if f(β) = 0 and β 1 mod 2 then f(β) 0 mod 4 and β 1 or 3 mod 4, but f(1) 2 mod 4 and f(3) 2 mod 4. Examle 2.6. For each ositive integer nnot divisible by and each u 1 mod Z, u is an nth ower in Z. Aly Hensel s lemma to f(x) = X n u with initial aroximation a = 1: f(1) = 1 u 0 mod and f (1) = n 0 mod. Therefore there is a unique solution to α n = u in Z such that α 1 mod. Examle 1.1 is the case u = 7, = 3, and n = 2: 7 has a unique 3-adic square root that is 1 mod 3. Examle 2.7. For an odd rime, suose u Z is a square mod. We will show u is a square in Z. For examle, 2 is a square mod 7 since 2 3 2 mod 7, and it will follow that 2 is a square in Z 7. Write u a 2 mod, so a 0 mod. For the olynomial f(x) = X 2 u we have 0 mod and f (a) = 2a 0 mod, since is not 2, so Hensel s lemma tells us that f(x) has a root in Z that reduces to a mod, which means u is a square in Q. Conversely, if u Z is a -adic square, say u = v 2, then 1 = v 2, so v Z and u v 2 mod. Thus the elements of Z that are squares in Q are recisely those that reduce to squares mod. For examle, the nonzero squares mod 7 are 1, 2, and 4, so u Z 7 is a 7-adic square if and only if u 1, 2, or 4 mod 7. This result can have roblems when = 2 because 2a 0 mod 2. In fact the lifting of a square root mod 2 to a 2-adic square root really does have a roblem: 3 1 2 mod 2 but 3 is not a square in Z 2 since 3 is not a square mod 4 (the squares mod 4 are 0 and 1). And 3 is not a square in Q 2 either because a hyothetical square root in Q 2 would have to be in Z 2 : if α 2 = 3 in Q 2 then α 2 2 = 3 2 = 1, so α 2 = 1, and thus α Z 2 Z 2. Examle 2.8. For each integer k between 0 and 1, k k mod. Letting f(x) = X X, we have f(k) 0 mod and f (k) = k 1 1 1 0 mod. Hensel s lemma imlies that there is a unique ω k Z such that ω k = ω k and ω k k mod. For instance,
4 KEITH CONRAD ω 0 = 0 and ω 1 = 1. When > 2, ω 1 = 1. Other ω k for > 2 are more interesting. For = 5, ω k is a root of X 5 X = X(X 4 1) = X(X 1)(X + 1)(X 2 + 1). Thus ω 2 and ω 3 are square roots of 1 in Z 5 : ω 2 = 2 + 5 + 2 5 2 + 5 3 + 3 5 4 + 4 5 5 + 2 5 6 + 3 5 7 +, ω 3 = 3 + 3 5 + 2 5 2 + 3 5 3 + 5 4 + 2 5 6 + 5 7 +. The numbers ω k for 0 k 1 are distinct since they are already distinct when reduced mod, so X X = X(X 1 1) slits comletely in Z [X]. Its roots in Z are 0 and -adic ( 1)th roots of unity. 3. Roots of unity in Q via Hensel s Lemma Hensel s lemma is often considered to be a method of finding roots to olynomials, but that is just one asect: the existence of a root. There is also a uniqueness art to Hensel s lemma: it tells us there is a unique root within a certain distance of an aroximate root. We ll use the uniqueness to find all of the roots of unity in Q. Theorem 3.1. The roots of unity in Q are the ( 1)th roots of unity for odd and ±1 for = 2. Proof. If x n = 1 in Q then x n = 1, so x = 1. This means every root of unity in Q lies in Z. Therefore we work in Z right from the start. First let s consider roots of unity of order relatively rime to. Assume ζ 1 and ζ 2 are roots of unity in Z with order rime to. Letting m be the roduct of the orders of these roots of unity, they are both roots of f(x) = X m 1 and m is rime to. Since f (ζ i ) = mζi m 1 = 1, the uniqueness asect of Hensel s lemma imlies that the only root α of X m 1 satisfying α ζ 1 < 1 is ζ 1. So if ζ 2 ζ 1 mod Z then ζ 2 = ζ 1 : distinct roots of unity in Z having order rime to must be incongruent mod. In Examle 2.8 we found in each nonzero coset mod Z a root of X 1 1, and 1 is rime to. Therefore each congruence class mod Z contains a ( 1)th root of unity, so the only roots of unity of order rime to in Q are the roots of X 1 1. Now we consider roots of unity of -ower order. We will show the only th root of unity in Z is 1 for odd and the only 4th roots of unity in Z 2 are ±1. This imlies the only th ower roots of unity in Z are 1 for odd and ±1 for = 2. (For instance, if there were a nontrivial -th ower root of unity in Q for 2 then there would be a root of unity in Q of order, but we re going to show there aren t any of those.) We first consider odd and suose ζ = 1 in Z. Since ζ ζ mod Z, we have ζ 1 mod Z. For the olynomial f(x) = X 1 we have f (ζ) = ζ 1 = 1/, so the uniqueness in Hensel s lemma imlies that the ball {x Q : x ζ < f (ζ) } = {x Q : x ζ 1/ 2 } = ζ + 2 Z contains no th root of unity excet for ζ. We will now show that ζ 1 mod 2 Z, so 1 is in that ball and therefore ζ = 1. Write ζ = 1 + y, where y Z. Then 1 ( ) 1 = ζ = (1 + y) = 1 + (y) + (y) k + (y). k For 2 k 1, ( k) is divisible by, so all terms in the sum over 2 k 1 are divisible by 3. The last term (y) is also divisible by 3 (since 3). Therefore if we reduce the above equation modulo 3 we get k=2 1 1 + 2 y mod 3 = 0 2 y mod 3.
HENSEL S LEMMA 5 Therefore y is divisible by, so ζ 1 mod 2 and this forces ζ = 1. Now we turn to = 2. We want to show the only 4th roots of unity in Z 2 are ±1. This won t use Hensel s lemma. If ζ Z 2 is a 4th root of unity and ζ ±1 then ζ2 = 1, so ζ 2 1 mod 4Z 2. However, ζ Z 2 = ζ 1 or 3 mod 4Z 2 = ζ 2 1 mod 4Z 2 and 1 1 mod 4Z 2. For each rime, a root of unity is a (unique) roduct of a root of unity of -ower order and a root of unity of order rime to, so the only roots of unity in Q are the roots of X 1 1 for 2 and ±1 for = 2. 4. A stronger version of Hensel s Lemma The hyotheses of Theorem 2.1 are 0 mod and f (a) 0 mod. This means a mod is a simle root of f(x) mod. We will now discuss a more general version of Hensel s lemma than Theorem 2.1. It can be alied to cases where a mod is a multile root of f(x) mod : 0 mod and f (a) 0 mod. This will allow us to describe squares in Z 2 and, more generally, th owers in Z. Theorem 4.1 (Hensel s lemma). Let f(x) Z [X] and a Z satisfy < f (a) 2. There is a unique α Z such that f(α) = 0 and α a < f (a). Moreover, (1) α a = /f (a) < f (a), (2) f (α) = f (a). Since f (a) Z, f (a) 1. If f (a) = 1 then the hyotheses of Theorem 4.1 reduce to those of Theorem 2.1: saying < 1 and f (a) = 1 means 0 mod and f (a) 0 mod. Theorem 4.1 actually goes beyond the conclusions of Theorem 2.1 when the hyotheses of Theorem 2.1 hold, since we learn in Theorem 4.1 exactly how far away the root α is from the aroximate root a. But the main oint of Theorem 4.1 is that it allows for the ossibility that f (a) < 1, which isn t covered by Theorem 2.1 at all. We will rove Theorem 4.1 by two methods, in Sections 5 and 6. Here are some alications where the olynomial has a multile root mod. Examle 4.2. Let f(x) = X 4 7X 3 +2X 2 +2X +1. Then f(x) (X +1) 2 (X 2 +1) mod 3 and we notice 2 mod 3 is a double root. Since f(2) 3 = 1/27 and f (2) 3 = 42 3 = 1/3, the condition f(2) 3 < f (2) 2 3 holds, so there is a unique root α of f(x) in Z 3 such that α 2 3 < 1/3, i.e., α 2 mod 9. In fact there are two roots of f(x) in Z 3 : 2 + 3 + 2 3 2 + 2 3 3 + 2 3 4 + 2 3 6 + and 2 + 3 2 + 3 4 + 2 3 5 + 2 3 6 +. The second root reduces to 2 mod 9, and is α. The first root reduces to 5 mod 9, and its existence can be verified by checking f(5) 3 = 1/27 < f (5) 2 3 = 1/9. Examle 4.3. Let f(x) = X 3 10 and g(x) = X 3 5. We have f(x) (X 1) 3 mod 3 and g(x) (X 2) 3 mod 3: 1 is an aroximate 3-adic root of f(x) and 2 is an aroximate 3-adic root of g(x). We want to see if they can be refined to genuine 3-adic roots. The basic form of Hensel s lemma in Theorem 2.1 can t be used since the olynomials do not have simle roots mod 3. Instead we will try to use the stronger form of Hensel s lemma in Theorem 4.1. Since f(1) 3 = 1/9 and f (1) 3 = 1/3, we don t have f(1) 3 < f (1) 2 3, so Theorem 4.1 can t be used on f(x) with a = 1. However, f(4) 3 = 1/27 and f (4) 3 = 1/3, so we can
6 KEITH CONRAD use Theorem 4.1 on f(x) with a = 4: there is a unique root α of X 3 10 in Z 3 satisfying α 4 3 < 1/3, so α 4 mod 9. The exansion of α begins as 1 + 3 + 3 2 + 2 3 6 + 3 7 +. Turning to g(x), we have g(2) 3 = 1/3 and g (2) 3 = 1/9, so we can t aly Theorem 4.1 with a = 2. In fact there is no root of g(x) in Q 3. If there were a root α in Q 3 then α 3 = 5, so α 3 = 1, and thus α Z 3. Then α 3 5 mod 3 n, so 5 would be a cube modulo every ower of 3. But 5 is not a cube mod 9 (the only cubes mod 9 are 0, 1, and 8). Therefore the mod 3 root of X 3 5 does not lift to a 3-adic root of X 3 5. Theorem 4.4. If u Z 2 then u is a square in Q 2 if and only if u 1 mod 8Z 2. Proof. If u = v 2 in Q 2 then 1 = v 2 2, so v Z 2. In Z 2/8Z 2 = Z/8Z, the units are 1, 3, 5, and 7, whose squares are all congruent to 1 mod 8, so u = v 2 1 mod 8Z 2. To show, conversely, that all u Z 2 satisfying u 1 mod 8Z 2 are squares in Z 2, let f(x) = X2 u and use a = 1 in Theorem 4.1. We have f(1) 2 = 1 u 2 1/8 and f (1) 2 = 2 2 = 1/2, so f(1) 2 < f (1) 2 2. Therefore X2 u has a root in Z 2, so u is a square in Z 2. Theorem 4.5. If 2 and u Z, then u is a th ower in Q if and only if u is a th ower modulo 2. Theorem 4.5 is false for = 2: the criterion for an element of Z 2 to be a 2-adic square needs modulus 2 3, not modulus 2 2. For instance, 5 is a square mod 4 but 5 is not a 2-adic square since 5 1 mod 8. Theorem 4.5 exlains what we found in Examle 4.3: 10 is a 3-adic cube and 5 is not, since 10 mod 9 is a cube and 5 mod 9 is not a cube. Proof. If u = v in Q then 1 = v, so v Z : we only need to look for th roots of u in Z. Let f(x) = X u. In order to use Theorem 4.1 on f(x), we seek an a Z such that < f (a) 2. This means a u < a 1 2 = 1/ 2, or equivalently a u mod 3. So rovided u is a th ower modulo 3, Theorem 4.1 tells us that X u has a root in Z, so u is a th ower. The criterion in the theorem, however, has modulus 2 rather 3. We need to do some work to bootstra an aroximate th root from modulus 2 to modulus 3 in order for Theorem 4.1 to aly. Suose u a mod 2 for some a Z. Then a Z and u/a 1 mod 2. Write u/a 1 + 2 c mod 3, where 0 c 1. By the binomial theorem, ( ) (1 + c) = 1 + (c) + (c) k. k k=2 The terms for k 3 are obviously divisible by 3 and the term at k = 2 is ( ) 2 (c) 2 = 1 2 3 c 2, which is also divisible by 3 since > 2. Therefore (1 + c) 1 + 2 c mod 3, so u/a (1 + c) mod 3. Now we can write u a (1 + c) 1 mod 3. From Theorem 4.1, a -adic integer that is congruent to 1 mod 3 is a th ower (see the first aragrah again). Thus u/(a (1 + c) ) is a th ower, so u is a th ower. Remark 4.6. Hensel s lemma in Theorem 4.1 guarantees a root mod lifts uniquely to a root in Z under weaker conditions than for Hensel s lemma in Theorem 2.1, but it does not guarantee a unique lift mod n, unlike for Theorem 2.1 (Remark 2.3). For examle, for 2 the unique solution to x = 1 in Z is 1, but for n > 1 the congruence x 1 mod n has solutions: 1 + n 1 b mod n for 0 b 1. The unique solution 1 mod of x 1 mod lifts to solutions of x 1 mod n for each n > 1. (What haens if = 2?) This is consistent with a unique lift to a root in Z since 1 + n 1 b 1 as n.
HENSEL S LEMMA 7 5. First Proof of Theorem 4.1: Newton s Method Our first roof of Theorem 4.1 will use Newton s method and is a modification of [4, Cha. II, 2, Pro. 2]. Proof. As in Newton s method from real analysis, define a sequence {a n } in Q by a 1 = a and (5.1) a n+1 = a n f(a n) f (a n ) for n 1. Set t = /f (a) 2 < 1. We will show by induction on n that (i) a n 1, i.e., a n Z, (ii) f (a n ) = f (a 1 ), (iii) f(a n ) f (a 1 ) 2 t 2n 1. For n = 1 these conditions are all clear. Note in articular that we have equality in (iii) and f (a 1 ) 0 since f(a 1 ) < f (a 1 ) 2. For the inductive ste, we need two olynomial identities. The first one is f(x + Y ) = f(x) + f (X)Y + g(x, Y )Y 2 for some g(x, Y ) Z [X, Y ], which is derived in exactly the same way as in the roof of Theorem 2.1. Then (5.2) x, y Z = f(x + y) = f(x) + f (x)y + zy 2, where z Z. The second olynomial identity we need is that for all F (X) Z [X], F (X) F (Y ) = (X Y )G(X, Y ) for some G(X, Y ) Z [X, Y ]. This comes from X Y being a factor of X i Y i for all i 1. Writing F (X) = m i=0 b ix i, m F (X) F (Y ) = b i (X i Y i ) and we can factor X Y out of each term on the right. For x and y in Z, G(x, y) Z, so (5.3) x, y Z = F (x) F (y) = x y G(x, y) x y. Assume (i), (ii), and (iii) are true for n. To rove (i) for n + 1, first note a n+1 is defined since f (a n ) 0 by (ii). To rove (i) it suffices to show f(a n )/f (a n ) 1. Using (ii) and (iii) for n, we have f(a n )/f (a n ) = f(a n )/f (a 1 ) f (a 1 ) t 2n 1 1. To rove (ii) for n + 1, (iii) for n imlies f(a n ) < f (a 1 ) 2 since t < 1, so by (5.3) with F (X) = f (X), f (a n+1 ) f (a n ) a n+1 a n = f(a n) f (a n ) = f(a n) f (a 1 ) < f (a 1 ) so f (a n+1 ) = f (a 1 ). To rove (iii) for n + 1, we use (5.2) with x = a n and y = f(a n )/f (a n ): ( f(a n+1 ) = f(x + y) = f(a n ) + f (a n ) f(a ) ( ) n) f(an ) 2 ( ) f(an ) 2 f + z (a n ) f = z (a n ) f, (a n ) where z Z. Thus, by (iii) for n, f(a n+1 ) f(a n ) f (a n ) This comletes the induction. 2 i=1 = f(a n) 2 f (a 1 ) 2 f (a 1 ) 4 t 2n f (a 1 ) 2 = f(a 1 ) 2 t 2n.
8 KEITH CONRAD Now we show {a n } is Cauchy in Q. From the recursive definition of this sequence, (5.4) a n+1 a n = f(a n ) f (a n ) = f(a n) f f (a 1 ) t 2n 1, (a 1 ) where we used (ii) and (iii). Thus {a n } is Cauchy. Let α be its limit, so α 1 by (i), i.e., α Z. Letting n in (ii) and (iii) we get f (α) = f (a 1 ) = f (a) and f(α) = 0. To show α a = /f (a), it s true if = 0 since then a n = a 1 = a for all n, so α = a. If 0 then we will show a n a = /f (a) for all n 2 and let n. When n = 2 use the definition of a 2 in terms of a 1 = a. For all n 2, by (5.4) (5.5) a n+1 a n f (a 1 ) t 2n 1 f (a 1 ) t 2 < f (a 1 ) t = f (a) t = f (a), where t (0, 1) since 0. 2 If a n a = /f (a) then a n+1 a n < a n a by (5.5), so a n+1 a = (a n+1 a n ) + (a n a) = a n a = /f (a). The last thing to do is show α is the only root of f(x) in the ball {x Z : x a < f (a) }. This will not use anything about Newton s method. Assume f(β) = 0 and β a < f (a). Since α a < f (a) we have β α < f (a). Write β = α + h, so h Z. Then by (5.2), 0 = f(β) = f(α + h) = f(α) + f (α)h + zh 2 = f (α)h + zh 2 for some z Z. If h 0 then f (α) = zh, so f (α) h = β α < f (a). But f (α) = f (a), so we have a contradiction. Thus h = 0, i.e., β = α. Before we give a second roof of Theorem 4.1, it s worth noting that the a n s converge to α very raidly. From the inequality a n+1 a n f (a 1 ) t 2n 1 for all n 1 we obtain by the strong triangle inequality a m a n f (a 1 ) t 2n 1 for all m > n. Letting m, (5.6) α a n f (a 1 ) t 2n 1 = f (a) t 2n 1 = f 2 n 1 (a) f (a) 2. Since /f (a) 2 < 1, the exonent 2 n 1 tells us that the number of initial -adic digits in a n that agree with those in the limit α is at least doubling at each ste. Examle 5.1. Let f(x) = X 2 7 in Q 3 [X]. It has two roots in Z 3 : α 1 = 1 + 3 + 3 2 + 2 3 4 + 2 3 7 + 3 8 + 3 9 +, α 2 = 2 + 3 + 3 2 + 2 3 3 + 2 3 5 + 2 3 6 + 3 8 + 3 9 +. Starting with a 1 = 1, for which f(a 1 )/f (a 1 ) 2 3 = 1/3, Newton s recursion (5.1) has limit α where α a 1 3 < f (a 1 ) 3 = 1, so α a 1 mod 3. Thus a n α 1. For examle, a 4 = 977 368 = 1 + 3 + 32 + 2 3 4 + 2 3 7 + 3 9 + 3 10 +, which has the same 3-adic digits as α 1 u through terms including 3 7 (the first 8 digits). The estimate in (5.6) says α 1 a n 3 f (a 1 ) 3 (1/3) 2n 1 = (1/3) 2n 1 for all n. Using a comuter, this inequality is an equality for 1 n 10. Examle 5.2. Let f(x) = X 2 17 in Q 2 [X]. It has two roots in Z 2 : α 1 = 1 + 2 3 + 2 5 + 2 6 + 2 7 + 2 9 +, α 2 = 1 + 2 + 2 2 + 2 4 + 2 8 +. 2 The case = 0 was not covered in an earlier draft and this omission was found when this roof was formalized using the Lean theorem rover [5,. 8].
HENSEL S LEMMA 9 Using Newton s recursion (5.1) for f(x) with initial seed a Z 2, we need a2 17 2 < 2a 2 2, which is the same as a 2 17 mod 8, and this congruence works for all a Z 2. Therefore (5.1) with a 1 Z 2 converges to α 1 or α 2. Since f (a) 2 = 1/2 for a Z 2, (5.1) with a 1 = a has a limit α satisfying α a 2 < f (a) 2 = 1/2, so α a mod 4: if a 1 mod 4 then α = α 1, and if a 3 mod 4 then α = α 2. By (5.6), α a n 2 f (a) 2 ( /f (a) 2 2 ) 2n 1 = (1/2)(4 a 2 17 2 ) 2n 1. For a few choices of a, this inequality is an equality for 1 n 10: When a = 1, α 1 a n 2 = (1/2) 2n +1 for 1 n 10. When a = 3, α 2 a n 2 = (1/2) 2n 1 +1 for 1 n 10. When a = 5, α 1 a n 2 = (1/2) 2n 1 +1 for 1 n 10. Examle 5.3. Let s solve X 2 1 = 0 in Q 3. This might seem silly, since we know the solutions are ±1, but let s check how an initial aroximation affects the 3-adic limit. We use a 1 = 2. (If we used a 1 = 1 then a n = 1 for all n, which is not interesting.) When f(x) = X 2 1 the recursion for Newton s method is a n+1 = a n f(a n) f (a n ) = a n a2 n 1 2a n = a2 n + 1 2a n = 1 2 ( a n + 1 ). a n Since f(2) 3 = 1/3 < f (2) 2 3, Newton s recursion with a 1 = 2 converges in Q 3. What is the limit? Since a 1 1 mod 3, we have a n 1 mod 3 for all n by induction: if a n 1 mod 3 then a n+1 = (1/2)(a n + 1/a n ) (1/2)( 1 + 1/( 1)) (1/2)( 2) 1 mod 3. Thus lim n a n = 1 in Q 3. What makes this interesting is that in R all a n are ositive so the sequence of rational numbers {a n } converges to 1 in R and to 1 in Q 3. The table below illustrates the raid convergence in R and Q 3 (the 3-adic exansion of 1 is 2 = 2222...). n a n Decimal arox. 3-adic arox. 1 2 2.00000000 20000000 2 5/4 1.25000000 22020202 3 41/40 1.02500000 22220222 4 3281/3280 1.00030487 22222222 6. Second Proof of Theorem 4.1: Contraction Maings Newton s method roduces a sequence converging to a root of f(x) by iterating the function x f(x)/f (x) with initial value a 1 = a where < f (a) 2. A root can also be found with a different iteration ϕ(x) = x f(x)/f (a), where again < f (a) 2. The denominator is f (a), not f (x), so it doesn t change. We will show ϕ is a contraction maing on a suitable ball around a. Then the contraction maing theorem will imly ϕ has a (unique) fixed oint α in that ball. 3 The condition ϕ(α) = α says α f(α)/f (a) = α, so f(α) = 0 and we have a root of f(x). Filling in the details leads to the following second roof of Theorem 4.1. If you re not interested in a second roof, go to the next section. Proof. For r [0, 1] to be determined, set B a (r) = {x Q : x a r}, so B a (r) Z. Set ϕ(x) = x f(x) f (a). We seek an r such that ϕ mas B a (r) back to itself and is a contraction on that ball. To show ϕ is a contraction on some ball around a, we want to estimate ϕ(x) ϕ(y) = f(x) f(y) x y f (a) 3 There is an analogous method in real analysis to simlify Newton s method to the contraction maing theorem by fixing the denominator in the recursion. See [2, Thm. 1.2,. 164].
10 KEITH CONRAD for all x and y near a in order to make this λ x y for some λ < 1. Write f(x) as a olynomial in X a, say f(x) = d i=0 b i(x a) i. Then b 0 =, b 1 = f (a), and b i Z for i > 1. For all x and y in Q, so f(x) f(y) = b i ((x a) i (y a) i ) = f (a)(x y) + i=1 (6.1) ϕ(x) ϕ(y) = x y f(x) f(y) f (a) = 1 f (a) b i ((x a) i (y a) i ), i=2 b i ((x a) i (y a) i ). (If d 1 then the sum on the right is emty.) In the olynomial identity i=2 i 1 X i Y i = (X Y ) X i 1 j Y j for i 2 set X = x a and Y = y a. Then i 1 (x a) i (y a) i = x y (x a) i 1 j (y a) j j=0 j=0 x y max x a i 1 j y a j 0 j i 1 x y max( x a, y a ) i 1 x y max( x a, y a ) when x a and y a are both at most 1. Therefore from (6.1), so for λ [0, 1), x a, y a 1 = ϕ(x) ϕ(y) x y max( x a, y a ) f (a), (6.2) x a, y a λ f (a) = ϕ(x) ϕ(y) λ x y. If we can find λ [0, 1), erhas deending on a and f(x), such that (6.3) x a λ f (a) = ϕ(x) a λ f (a) then (6.2) will tell us that ϕ is a contraction maing on the closed ball around a of radius λ f (a). We will see that if < f (a) 2 then a choice for λ is /f (a) 2. In fact, we want to do more: show the condition < f (a) 2 arises naturally by trying to make (6.3) work for some unknown λ. For λ (0, 1), when x a λ f (a) we have ϕ(x) a λ f (a) x a f(x) f (a) λ f (a) f(x) f (a) λ f (a). Returning to the formula f(x) = d i=0 b i(x a) i, where b 0 = and b 1 = f (a), (6.4) f(x) f (a) = f (a) + (x a) + b i f (a) (x a)i. When x a λ f (a), which is less than f (a) 1, we have for i 2 that b i f (x a)i (a) x a 2 f λ 2 f (a) λ f (a), (a) i=2
so by (6.4) HENSEL S LEMMA 11 f(x) f (a) λ f (a) f (a) λ f (a) f (a) 2 λ. To make this occur for some λ < 1 is equivalent to requiring /f (a) 2 < 1. Therefore if < f (a) 2 and we set λ = /f (a) 2, the maing ϕ(x) = x f(x)/f (a) is a contraction on the closed ball around a with radius λ f (a) = /f (a) and contraction constant λ. Any closed ball in Q is comlete, so the contraction maing theorem imlies that the sequence {a n } defined recursively by a 1 = a and (6.5) a n+1 = ϕ(a n ) = a n f(a n) f (a) for n 1 converges to the unique fixed oint of ϕ in B a ( /f (a) ). By the definition of ϕ, a fixed oint of ϕ is the same thing as a zero of f(x), so there is a unique α in Q satisfying f(α) = 0 and α a /f (a). To finish this roof of Theorem 4.1 we need to show α a = /f (a), α is the unique root of f(x) in Z such that α a < f (a), and f (α) = f (a). α a = /f (a) : We have a 2 a = a 2 a 1 = f(a 1 )/f (a) = /f (a), and for all n a n+1 a n = ϕ n (a) ϕ n 1 (a) = ϕ n 1 (ϕ(a)) ϕ n 1 (a) λ n 1 a 2 a 1 < f (a). Then a n a = /f (a) for all n by induction, so α a = /f (a) by letting n. α is the unique root of f(x) such that α a < f (a) : This was roved at the end of the first roof of Theorem 4.1 without needing Newton s method. We won t rewrite the roof. f (α) = f (a) : Since a 1 = a, of course f (a 1 ) = f (a). If f (a n ) = f (a) for some n 1 then f (a n+1 ) f (a n ) a n+1 a n f (a) where the first inequality is from (5.3) and the second inequality is from a n and a n+1 both lying in the closed ball around a of radius /f (a). Thus f (a n+1 ) f (a n ) < f (a) = f (a n ), so f (a n+1 ) = f (a n ) = f (a). We ve shown f (a n ) = f (a) for all n, and letting n gives us f (α) = f (a). It s worthwhile to comare the recursions from Newton s method and from the contraction maing theorem: Newton s method: a n+1 = a n f(a n) f (a n ), with a 1 = a and < f (a) 2. Contraction maing: a n+1 = a n f(a n) f (a), with a 1 = a and < f (a) 2. The difference is the denominators f (a n ) and f (a), and this has a rofound effect on the rate of convergence. In Newton s method, (5.6) tells us (6.6) α a n f 2 n 1 (a) f (a) 2. To find an estimate on α a n from the contraction maing theorem, the recursion given by (6.5) imlies a n+1 a n = ϕ n 1 (a 2 ) ϕ n 1 (a 1 ) a 2 a 1 λ n 1 for all n 1, where
12 KEITH CONRAD a 2 a 1 = /f (a) and λ = /f (a) 2. If m > n the strong triangle inequality imlies a m a n max a j+1 a j a 2 a 1 λ n 1 = n j m 1 f (a) λ n 1. Letting m yields (6.7) α a n f (a) f (a) 2 Since the uer bound in (6.6) goes to 0 much faster than the uer bound in (6.7), we can anticiate that α a n would shrink to 0 faster when {a n } is roduced from Newton s recursion comared to the contraction maing recursion. In articular, when /f (a) 2 = 1/, the decay bound in (6.6) goes to 0 like (1/) 2n 1 while in (6.7) the decay bound goes to 0 like (1/) n, so we exect there to be a doubling of correct -adic digits at each ste in Newton s recursion while we get just one new correct -adic digit at each ste by the contraction maing recursion. Let s see an examle. Examle 6.1. Let f(x) = X 2 7 with a = 1, so /f (a) 3 = 1/3. The sequence {a n } roduced from the contraction maing recursion (6.5) with a 1 = 1 has a limit α and with a comuter we find α a n 3 = (1/3) n for 1 n 10. The sequence {a n } based on Newton s recursion in Examle 5.1 with a 1 = 1 has the same limit α, but a comuter tells us that α a n 3 = (1/3) 2n 1 for 1 n 10, which is much smaller than (1/3) n. n 1 7. Necessity of < f (a) 2 In Theorem 4.1 we have f (α) = f (a) 0, so Hensel s lemma roduces a simle root of f(x) in Z that is close to a. The criterion < f (a) 2 in Theorem 4.1 is not just a sufficient condition for there to be a simle root of f(x) near a but it is also more or less necessary, as the next theorem makes recise. Theorem 7.1. If f(x) Z [X] has a simle root α in Z, then for all a Z that are close enough to α we have f (a) = f (α) and < f (a) 2. In articular, these conditions hold when a α < f (α). Proof. We have f (α) f (a) α a < f (α), so f (a) = f (α). Also = f(α + (a α)) = f(α) + f (α)(a α) + z(a α) 2 = f (α)(a α) + (a α) 2 z for some z Z. Each term on the right side has absolute value less than f (α) 2, so also < f (α) 2 = f (a) 2. 8. Hensel s lemma beyond Q Both of our roofs of Hensel s lemma, as well as the roof of Theorem 7.1, work in every field comlete with resect to an absolute value satisfying the strong triangle inequality. Theorem 8.1. Let K be a field that is comlete with resect to an absolute value such that x + y max( x, y ) for all x and y in K. Set o = {x K : x 1}. If a olynomial f(x) has coefficients in o and some a o satisfies < f (a) 2 then there is a unique α o such that f(α) = 0 and α a < f (a). Moreover, α a = /f (a) < f (a) and f (α) = f (a). Conversely, if f(x) has a simle root α o, then for a K such that a α < f (α) we have f (a) = f (α) and < f (a) 2..
HENSEL S LEMMA 13 For a version of Hensel s lemma where Z is relaced by a comlete local ring (ossibly not an integral domain, so division is more delicate), see [3, Theorem 7.3]. For a form of Hensel s lemma dealing with zeros of several olynomials (or ower series) in several variables, see [1, Cha. III, 4.3]. References [1] N. Bourbaki, Commutative Algebra, Sringer-Verlag, New York, 1989. [2] C. H. Edwards, Jr., Advanced Calculus of Several Variables, Academic Press, New York, 1973. [3] D. Eisenbud, Commutative Algebra with a View to Algebraic Geometry, Sringer-Verlag, New York, 1995. [4] S. Lang, Algebraic Number Theory, 3rd ed., Sringer-Verlag, New York, 1994. [5] R. Y. Lewis, A Formal Proof of Hensel s Lemma over the -adic Integers, online at htt://roberty lewis.com/adics/adics.df.