PRIME GENERATING LUCAS SEQUENCES

PRIME GENERATING LUCAS SEQUENCES PAUL LIU & RON ESTRIN Science One Program The University of British Columbia Vancouver, Canada April 011 1

PRIME GENERATING LUCAS SEQUENCES Abstract. The distribution of prime numbers in Lucas sequences was investigated by independently changing the initial values and the multiplicative constants in the recursive denition of the sequence. The prime distribution was obtained by counting the number of primes in the rst 1000 terms of various Lucas sequences. It was found that both smaller seeds and smaller multipliers produced more primes on average than if the seeds and multipliers were large. It was also determined that changing the initial seeds produced more primes and more variation in prime counts than changing the multipliers. Introduction. The search for prime numbers is an active aspect of mathematics that appears in several elds, most common of which is in number theory. Prime numbers found in Lucas sequences have been of interest for some time, and have led to several discoveries and the creation of prime tests such as the Lucas-Lehmer Prime Test. These rather special numbers have found various applications in everyday life, such as algorithms and methods for data encryption. Let us dene (informally) Lucas sequences as terms of the following recursive sequence: M(a, b, P, Q) : a n = P a n 1 Q a n, a 1 = a, a = b (1) Among the most well-known Lucas sequences are the Fibonacci sequence (a n = a n 1 +a n, a 1 = 0, a = 1), and its complement Lucas sequence (where a 1 =, a = 1). Lucas sequences can also be dened based on their characteristic polynomial X P X + Q, whose roots: α = P + D and β = P D, D = P 4Q create the sequences U n = α β α β (where a 1 = 0, a = 1), and V n = α n + β n (where a 1 =, a = 1) which are complement Lucas sequences. The Lucas sequence for values of P = 3, Q =, creates the sequence U n = n 1, where prime U n are known as Mersenne primes, and U 4311609 [], is the currently largest known prime. Thus it is apparent that interesting events can be seen with primes in Lucas sequences. This paper undertakes trends in prime density in general Lucas sequences in two aspects - dierent values of a 1, a, and various values of P, Q. From this, several important observations are made before the search begins so as to restrict the search space. Consider various values of a 1 = a, a = b (P = 1, Q = 1), such that the Lucas sequence is not the Fibonacci sequence. Then the sequence would be written as: Compare this to the Fibonacci sequence: M(a, b, 1, 1) : a, b, a + b, a + b, a + 3b, 3a + 5b,... () F n = 1, 1,, 3, 5, 8... We can see that sequence () is simply a combination of two Fibonacci sequences, such that: M(a, b, 1, 1) : a n = a F n + b F n 1 (3) This sequence will be dened as the general Fibonacci sequence, G n (a, b) for a 1 = a and a = b. Let us also dene L n as sequence obtained from G n (, 1). Hence we make our rst important observation. If a and b are not coprime (they share a common divisor) then there is no hope in obtaining any primes (except for the initial values if they happen to be prime). Consider a common divisor, d, between a and b. Then since a = dr and b = ds for some r, s M(a, b, 1, 1) : a n = a F n + b F n 1 = d(r F n + s F n 1 ). The exact same observation can be made for various values of P, Q. If we multiply the terms of the sequence by non-coprime values, then it is equal to multiplying the entire sequence by some number, which means that any resultant value has a divisor that is not itself.

PRIME GENERATING LUCAS SEQUENCES 3 It is well known that prime numbers are quite random in their appearance. Let us dene Prime Density Distribution as the frequency of the prime count in the rst 1000 terms of the sequence divided by the total number of data points we have. Prime numbers grow at the rate of approximately n log n (where n is the n th prime), and so as our seeds and multipliers become big, we expect that the number of primes that we obtain within a 1000 terms of the sequence would likely decrease. Furthermore, the Prime Number Theorem (PNT) states that the probability of hitting a prime near some number N, is approximately 1 /log N, and so the prime density would likely decrease as our multipliers and initial values become large. Methods. The data was collected through computational searches produced in the programming language Mathematica. As primality testing took up the bulk of this project, Mathematica was chosen for its eective prime testing function as well as its ability to analyze massive amounts of data. To determine the eect of dierent initial values on the prime density for G n (a, b), all values G n (a, b) (a, b, n < 1000) were tested for primes. The dierent seeds (a, b) were then ranked by the number of primes in the rst 1000 terms. However, as searching exhaustively through such a large space requires massive computational power, many optimizations were used to speed up the search. Due to the condition that only coprime pairs produced sequences with any primes, only coprime pairs of (a, b) were used. Furthermore, G n (a, b), where a > b, produces a sequence that would already have been searched for some G n (c, d), c < d, and so we set for the condition that a < b. Additionally, the issue of a sequence using seeds that are consecutive terms in some previously searched sequence was corrected for by pruning the list of results after the search. These simple restrictions easily reduced the search space by over 70%. Due to the recursive adding involved in the general Fibonacci sequence, there is always one even number for every two odd numbers. Using this fact, the search space was further reduced by testing only odd terms. Additionally, to speed up the explicit term calculations of G n (a, b), a lookup table of Fibonacci numbers was used as opposed to any actual calculation. To search through dierent multipliers for M n (a, b, P, Q), all values of M n (a, b, P, Q) (0 < P, Q < 00; n < 1000) were tested for primes. The values of (P, Q) were restricted to under 00 as multipliers greater than this value required an inordinate amount of computation time due to their exponential growth. Again, the dierent multipliers (P, Q) were ranked based on number of primes in the rst 1000 terms and the requirement of coprimality was used to reduce the search space. However, as M n (a, b, P, Q) and M n (a, b, Q, P ) do not produce the same sequences, one cannot impose that P < Q. Additionally, as M n (a, b, P, Q) cannot be decomposed easily into Fibonacci sequences, the recursive denition of M n was used to calculate the terms instead of using a lookup table. Then to gain some understanding of how initial values changed the eect of dierent multipliers, F n (P, Q) was investigated for one trial, and L n (P, Q) for another. Results. At an initial glance, the histogram of prime numbers within the rst 1000 terms show a slightly skewed normal distribution, but by using Mathematica's FindDistributionParameters function, it was determined that it is not quite such a distribution.

PRIME GENERATING LUCAS SEQUENCES 4 Figure 1: The probability density of prime numbers in the rst 1000 terms of the Fibonacci Sequence with changing initial seeds. The mean is 15.3 primes with a standard deviation of 4.9. Regardless, it can be seen that the distribution of prime numbers is quite random considering how closely their probability distribution resembles a Gaussian one. The most prime rich sequence was G n (3, 341), which produced 46 primes, followed by G n (179, 937) which produced 43 primes. These are intermediate sized values within the search, but after further analysis, it appears that the smaller seeds produce on average more primes. Figure : The probability density of prime numbers in the rst 1000 terms of the Lucas Sequence (Left) and Fibonacci Sequence (Right) with changing multipliers, which appear quite similar. The mean of the number of primes found in Lucas Sequences is 6.16 against Fibonacci Sequences with 6.07. From above, we see that the Lucas sequence shows a slightly higher average for primes, but overall they are very similar. Interestingly, the highest production of primes in the Lucas sequence came out to be the one with no multipliers at all L n (1,1), and L n (5,4), which produced 6 primes. They were followed by L n (3,10) and L n (11,6) with 4 primes. The Fibonacci sequence showed similar trends with low multipliers but producing fewer primes, with F n (6,15) and F n (11,90) producing only 3 primes. Thus, it can be seen that changing the initial values of the sequences aects the prime production much more, and produces many more primes than changing the multipliers. Figure 3: The frequency of primes in the smallest 10% of our seed search space (left) and the biggest 10% (right). Means of smallest seeds found to be 16.8 compared to the 10% biggest seeds found to be 14.7. See appendix for determination of size of seeds for the histograms. From the 10617 data points collected by varying the initial values of G n, it can be seen that the smaller seeds belonging in the lower 10% produce 1 prime higher on average than the larger seeds. To determine if

PRIME GENERATING LUCAS SEQUENCES 5 there existed a signicant dierence between these averages, a null hypothesis of no dierence was assumed and a Location Equivalence Test was performed. The resulting p-value was approximately.8 10 5, indicating that there is a very signicant dierence between the means of the smaller seeds and the larger seeds. Figure 4: The frequency of primes in the smallest 10% of our multiplier search space (left) and the biggest 10% (right). Smaller multipliers yielded an average of 10 primes, 5 more on average than big multipliers. See appendix for how multipliers were separated for the histograms. From the 4463 data points collected by varying the multipliers of L n, it can be seen that the multipliers belonging in the lower 10% produce 5 primes higher on average than the larger multipliers (the same holds for varying the multipliers of the Fibonacci sequence). Similar to the analysis between dierence seeds, a Location Equivalence Test was performed and a p-value of 4.8 10 67 was calculated, indicating that the lower 10% are almost certainly dierent than the top 10%. Discussion. From the PNT, we can see that for large N, the probability decreases inversely proportional to ln N. From the equations of V n and U n, we can also see that the growth of Lucas numbers follows roughly the exponential growth of α n. Hence, P (prime) = 1 ln N = 1 n ln α and so the probability of prime Lucas decreases by roughly 1 n. This rough calculation shows us the prime density of Lucas sequences is intimately connected with the growth of its terms. As expected, we then see that the small seeds and small multipliers of the examined Lucas sequences produces signicantly more primes on average than the large primes and large multipliers. Moreover, increasing the size of the multipliers also has a much greater eect than increasing the size of the seeds, as increasing multiplier size increases α. This explains why there was such a signicant dierence between the bottom 10% of the multiplier distribution compared to the top 10%. It also explains the shape of the distribution as well. Since the distribution of the primes become more sparse as the numbers grow large, fewer primes will be obtained by the higher values more often. Thus, the distribution will be skewed to the left as we can see obviously see. We can also provide an estimate for the expected number of primes (E({a n })) in a given (1000 term) Lucas sequence by simply summing up the rst 1000 probabilities: E({M n }) = 1000 n=1 1 ln M n

PRIME GENERATING LUCAS SEQUENCES 6 Figure 5: Our prime density estimate of the number of primes to be yielded for various seed values (left) and multiplier values (right). The mean of the left distribution is found to be 9.33 with a 0.5 standard deviation, while the mean of the right distribution is 1.70 with a standard deviation of 0.38. They are both large underestimates of the actual mean, but the distribution appears similar. Though using the PNT provides an extremely poor estimate of prime probabilities in general (as seen from the multipliers estimate), we can see (for dierent seed values in particular) some similarities between the distributions produced by the PNT graph and the distribution produced by the actual data. The distributions observed in the estimate of prime probabilities, primes within various multipliers of Fibonacci and Lucas sequences, and primes within Lucas sequences with dierent initial seeds all appear strangely similar. This is somewhat expected, due to the PNT predicting that both smaller initial values and smaller multipliers would generally yield a higher number of primes, although why these various situations lead to such similar graphs is unknown. No common distribution appeared to model this data, as normal, Poisson, gamma nor beta-prime distribution gave satisfactory results. Using various statistical distribution tests, the p-value of the normal distribution obtained from Mathematica's calculations produced a value on the order of 10 14, indicating that it is very unlikely that the data came from the distribution. A Poisson distribution seemed like the most likely candidate as shown by Gallagher [4] when the bins tend to innity. Unfortunately, it too failed the statistical model tests, on the order of 10 14 as well, showing that the distribution is unlikely. Other distributions such as the beta-prime, binomial, and gamma distributions were attempted and also failed with p-values on the order of 10 15 and 10 14. Oddly, based on the shape of the graphs, the gamma distribution appeared to be the closest, but produced the lowest p-value, indicating it was still a poor t. A normal distribution is unlikely for two large reasons, as it assumes an even and random distribution of data on either side of the mean, and it is a continuous distribution where our data is clearly discrete. A binomial distribution is obviously wrong as well, since we are looking at the number of primes in the rst 1000 terms of a sequence, not success/failure experiments with only two outcomes. Additionally, due to certain special properties of the Lucas sequences, the primes obtained are not exactly random. For example, we can easily see from the Fibonacci sequence that there are two odd terms for every even term. If our sample was drawn from a random set of integers, there would be a 50% probability of drawing an even term versus an odd term. However, because of the recursive adding involved in the Fibonacci sequence, there is a two-thirds probability of drawing an odd number and one-third probability of drawing an even number. Moreover, the recursive adding involved in the Fibonacci sequence also guarantees a unique prime factor appearing in each Fibonacci number that has not been a factor of previous Fibonacci numbers [3]. Thus, the structure behind factors appearing in regular integers (a factor of for every second term, 3 for every third term, etc.) disappears entirely in the Fibonacci sequence. Such properties also appear in other Lucas sequences as well, such as the Lucas sequence describing Mersenne numbers. Thus, changing both the seeds and the multipliers suer from the above properties and yield diculties when modeled by a simple distribution. Conclusion. After compiling all of the primes generated by Lucas sequences and performing various statistical tests on various subsections of it, lower initial seeds were found to be statistically signicant from larger seeds, producing on average 1 more prime. Similarly, smaller multipliers were found to produce 1 more prime on average. Although, Gallagher [4] predicts that the distribution of these primes should follow a Poisson

PRIME GENERATING LUCAS SEQUENCES 7 distribution as the intervals tend to innity, our data does not particularly show this, considering we used nite bins for only the rst 1000 terms in a Lucas sequence. The distribution of primes which appeared similar in all tests, remains unknown, and requires further investigation to conrm that it would tend to a Poisson distribution. Acknowledgments. Though there are many who contributed to the writing of this paper, we would like to thank two parties in particular. Without their eorts and and time, completion of this paper would not have been possible. For his excellent advice, sharp wit and rugged good looks, we would like to thank our advisor, Professor Fok-Shuen Matthew Leung. For an excellent and thorough job of editing this paper, we would like to thank Paul Kapos and Bennet Leung. Additional gratitude goes towards all who have helped but whose names are not explicitly mentioned. References [1] Ribenboim, P. (000). My Numbers, My Friends. Popular Lectures on Number Theory. New York: Springer. [] (011) Great Internet Mersenne Prime Search. Retrieved from: http://www.mersenne.org/. [3] Carmichael, R.D. On the Numerical Factors of the Arithmetic Forms α n ± β n. The Annals of Mathematics 15, 30-48 (1913-1914). [4] Gallagher, P. X. On the Distribution of Primes in Short Intervals. Mathematika 3, 4-9 (1976). Appendix. Ranking system of the multipliers and seeds. To group the multipliers and seeds into the top 10% and bottom 10%, scores were assigned to each pair of (P, Q) and (a, b) based on the growth rate of their sequences. For dierent seeds, since the Fibonacci sequence can be approximated by [ ] ϕ n F n = round, 5 we can approximate G n as G n = a F n + b F n 1 a ϕn 5 ( ) + b ϕn 1 a = 5 ϕ + b ϕn 1 = K ϕn 1 5 5 where ϕ = 1+ 5 and K = a ϕ + b. As the maximum value of a and b is 1000, the maximum value of K is K max = 1000 ( 1 ϕ + 1 ). Thus the bottom 10% of seeds can be dened as seeds where K < 0.1K max while the top 10% of seeds can be dened as seeds where K > 0.9K max. Dening α = P + D (as in the introduction), we see that for dierent multipliers, growth is controlled mostly by α. Thus, our maximum growth rate would be when P = Q = 00, α max = P + P +4Q = 100 + 100 10 (where the +4Q comes from computing Lucas sequences of L n (P, Q)). From this, we can dene the bottom 10% as multipliers where α < 0.1α max and the top 10% as α > 0.9α max.