2 3 47 6 23 Journal of Integer Sequences, Vol. 8 2005, Article 05.2.8 On the Density of Languages Representing Finite Set Partitions Nelma Moreira and Rogério Reis DCC-FC & LIACC Universidade do Porto R. do Campo Alegre 823 450 Porto Portugal nam@ncc.up.pt rvr@ncc.up.pt Abstract We present a family of regular languages representing partitions of a set of n elements in less or equal c parts. The density of those languages is given by partial sums of Stirling numbers of second kind for which we obtain explicit formulas. We also determine the limit frequency of those languages. This work was motivated by computational representations of the configurations of some numerical games. The languages L c Consider a game where natural numbers are to be placed, by increasing order, in a fixed number of columns, subject to some specific constraints. In these games column order is irrelevant. Numbering the columns, game configurations can be seen as sequences of column numbers where the successive integers are placed. For instance, the string 23 stands for a configuration where, 2, 4 were placed in the first column, 3 was placed in the second and 5 was placed in the third. Because column order is irrelevant, and to have a unique representation for each configuration, it is not allowed to place an integer in the Work partially funded by Fundação para a Ciência e Tecnologia FCT and Program POSI.
kth column if the k th is still empty, for any k >. Blanchard and al. [BHR04] and Reis and al. [RMP04] used this kind of representation to study the possible configurations of sum-free games. Given c columns, let N c = {,..., c}. We are interested in studying the set of game configurations as strings in N c, i.e., in the set of finite sequences of elements of N c. Game configurations can be characterised by the following language L c N c : L c = {a a 2 a k N c i N k, a i max{a,..., a i } + }. For c = 4, there are only 5 strings in L 4 of length 4, instead of the total possible 256 in N 4 4 : 2 2 22 23 2 22 23 22 222 223 23 232 233 234 Given a finite set Σ, a regular expression r.e. α over Σ represents a regular language Lα Σ and is inductively defined by: is a r.e and L = ; ɛ empty string is a r.e and Lɛ = {ɛ}; a Σ is a r.e and La = {a}; if α and α 2 are r.e., α +α 2, α α 2 and α are r.e., respectively with Lα + α 2 = Lα Lα 2, Lα α 2 = Lα Lα 2 and Lα = Lα, where we assume the usual precedence of the operators see [HMU00]. A regular expression α is unambiguous if for each w Lα there is only one path through α that matches w. Theorem.. For all c, L c is a regular language. Proof. For c =, we have L = L. We define by induction on c, a family of regular expressions: It is trivial to see that For instance, α 4 is α =, c α c = α c + j + + j. 2 α c = i= j= j= i j + + j. 3 + 2 + 2 + 2 + 2 3 + 2 + 3 + 2 + 2 3 + 2 + 3 4 + 2 + 3 + 4. It is also obvious that Lα c Lα c, for c >. For any c, we prove that L c = Lα c. L c Lα c : If x Lα c it is obvious that x L c. L c Lα c : By induction on the length of x L c : If x = then x Lα Lα c. Suppose that for any string x of length n, x Lα c. Let x = n + and x = ya, where a N c and y Lα c. Let c = max{a i a i y}. If c = c, obviously x Lα c. If c < c, then y Lα c, and x Lα c + Lα c. 2
2 Counting the strings of L c The density of a language L over a finite set Σ, ρ L n, is the number of strings of length n that are in L, i.e., In particular, the density of L c is ρ L n = L Σ n. ρ Lc n = L c N n c. Using generating functions we can determine a closed form for ρ Lc n. Recall that, a ordinary generating function for a sequence {a n } is a formal series see [GKP94] Gz = a n z n. i=0 If Az and Bz are generating functions for the density functions of the languages represented by unambiguous regular expressions A and B, and A + B, AB and A are also unambiguous r.e., we have that Az + Bz, AzBz and, are the generating Az functions for the density functions of the corresponding languages see [SF96], page 378. As α c are unambiguous regular expressions, from 3, we obtain the following generating function for {ρ Lc n}: Notice that T c z = i i= j= z jz = S i z = i j= z i i= i j= jz z i jz. are the generating functions for the Stirling numbers of second kind Sn, i = i i j i j n, i! j j=0 which are, for each n, the number of ways of partitioning a set of n elements into i nonempty sets see [GKP94] and A008277. Then, a closed form for the density of L c, ρ Lc n, is given by ρ Lc n = Sn, i, 4 i= i.e., a partial sum of Stirling numbers of second kind. In Table we present the values of ρ Lc n, for c =..8 and n =..3. For some sequences, we also indicate the corresponding number in Sloane s On-Line Encyclopedia 3
of Integer Sequences [Slo03]. The closed forms were calculated using the Maple computer algebra system [Hec03]. From expression 4, it is also easy to see that Theorem 2.. lim ρ L c n = B n, c where B n are the Bell numbers, i.e., for each n, the number of ways a set of n elements can be partitioned into nonempty subsets. Proof. Bell numbers, B n, can be defined by the sum n B n = Sn, i. i= And, as Sn, i = 0 for i > n, we have n lim ρ L c n = Sn, i + lim c i= c i=n+ Sn, i = B n. In Table, for each c, the subsequence for n c coincides with the first c elements of B n A0000. Moreover, we can express ρ Lc n, and then the partial sums of Stirling numbers of second kind, as a generic linear combination of nth powers of k, for k N c. Let S j n, i denote the jth term in the summation of a Stirling number Sn, i, i.e., S j n, i = i i! j i j n. j Lemma 2.. For all n and 0 i n, Proof. S n, i + = S 0 n, i = S n, i +. 5 i +! i + i n = i! in = S 0 n, i. Applying 5 in the summation 4 of ρ Lc n, each term Sn, i simplifies the subterm S n, i with the subterm S 0 n, i, for i 2. We obtain ρ L n = S 0 n,, ρ L2 n = S 0 n, 2, ρ Lc n = S 0 n, c + i S j n, i, for c > 2; i=3 = cn i c! + i=3 S j n, i, for c > 2. 4
c ρ Lc n OEIS,,,,,,,,,,,,,,... 2 2 2n, 2, 4, 8, 6, 32, 64, 28, 256, 52, 024, 2048, 4096, 892,... 3 6 3n + 2, 2, 5, 4, 4, 22, 365, 094, 328, 9842, 29525, 88574, 26572,... A00705 4 24 4n + 4 2n + 3, 2, 5, 5, 5, 87, 75, 2795, 05, 43947, 75275, 700075, 279825,... A00758 5 20 5n + 2 3n + 6 2n + 3 8, 2, 5, 5, 52, 202, 855, 3845, 8002, 86472, 422005, 2079475, 0306752,... A056272 6 720 6n + 48 4n + 8 3n + 3 6 2n + 30, 2, 5, 5, 52, 203, 876, 4, 20648, 09299, 60492, 340327, 9628064,... A056273 7 5040 7n + 240 5n + 72 4n + 6 3n + 60 2n + 53 44, 2, 5, 5, 52, 203, 877, 439, 20, 579, 665479, 4030523, 25343488,... A099262 8 40320 8n + 440 6n + 360 5n + 64 4n + 80 3n + 53 288 2n + 03 280, 2, 5, 5, 52, 203, 877, 440, 246, 5929, 677359, 489550, 2724300,... A099263 Table : Density functions of L c, for c =..8. 5
If the sums are rearranged such that i = k + j, we have ρ Lc n = cn c 2 c! + c k S j n, k + j, 6 k= where S j n, k + j = k n k + j k + j j 7 = kn k! j. 8 Replacing 8 into equation 6 we get ρ Lc n = cn c 2 c! + k= k n c k k! j. 9 In equation 9, the coefficients of k n, k c, can be calculated using the following recurrence relation: And, we have γ = ; γ c = γ c + c, for c > ; c! γk c = γc k, for c > and 2 k c. k Theorem 2.2. For all c, ρ Lc n = γkk c n. 0 k= From the expression 0, the closed forms in Table are easily derived. Finally, we can obtain the limit frequency of L c in N c. Since ρ Nc n = cn, and lim n k c n = 0, for k c 2, we have Theorem 2.3. For all c, lim n ρ Lc n ρ Nc n = c!. 6
3 A bijection between strings of L c and partitions of finite sets The connection between the density of L c and Stirling numbers of second kind is not accidental. Each string of L c with length n corresponds to a partition of N n with no more than c parts. This correspondence can be made explicit as follows. Let a a 2 a n be a string of L c. This string corresponds to the partition {A j } j Nc of N n with c = max{a,..., a n }, such that for each i N n, i A ai. For example, the string 23 corresponds to the partition {{, 2}, {3}, {4}} of N 4 into 3 parts. This defines a bijection. That each string corresponds to a unique partition is obvious. Given a partition {A j } j Nc of N n with c c, we can construct the string b b n, such that for i N n, b i = j if i A j. For the partition {{, 2}, {3}, {4}}, we obtain 23. 4 Counting the strings of L c of length equal or less than a certain value Although strings of L c of arbitrary length represent game configurations, for computational reasons 2 we consider all game configurations with the same length, padding with zeros the positions of integers not yet in one of the c columns. In this way, we obtain the languages L 0 c = L c {0 }. So, determining the number of strings of length equal or less than n that are in L c is tantamount to determining the density of L 0 c, i.e., n = L 0 c {0} N c n. As seen in Section 2, and because L c {0 } = Lα c 0, the generating function T cz of n can be obtained as the product of a generating function for ρ Lc n, T c z, by a generating function for ρ {0} n, e.g., z. Thus, the generating function for {ρ L 0 n} is c T cz = T c z z = z i z i j= jz and a closed form for n is as expected n = i= n m= i= where m starts at because Sm, i = 0, for i > m. 2 The data structures used in the programs are arrays of fixed length. Sm, i, 7
Using expression 9 in we have ρ L 0 n = n; ρ L 0 2 n = 2 n ; n n = m= c m c! + c 2 k= k m k! = cn+ c c c c! + n = c k j c n c c c! + n If we use the equation 0, we have Theorem 4.. For all c, Proof. j, for c > 2; c 2 k n+ k c k + k k! j n = nγ c + + c 2 j k n k k! γk ckn+ k. k c k j. n = n γkk c m = m= k= γk c k= m= n k m = nγ c + γk ckn+ k. k In the Table 2 we present the values of n, for c =..8 and n =..3. As before, the limiting sequence as c is the sequence of partial sums of Bell numbers. Finally, we determine the limit frequency of L 0 c in N c {0}. Notice that ρ Nc {0} n = cn+ c, as it is a sum of the first n terms of a geometric progression of ratio c. We have, Theorem 4.2. For all c, lim n n ρ Nc {0} n = c!. 8
c n OEIS n, 2, 3, 4, 5, 6, 7, 8, 9, 0,, 2, 3,... 2 2 n, 3, 7, 5, 3, 63, 27, 255, 5, 023, 2047, 4095, 89,... A000225 3 4 3n + 2 n 4, 3, 8, 22, 63, 85, 550, 644, 4925, 4767, 44292, 32866, 398587,... A047926 4 8 4n + 2 2n + 3 n 5 9, 3, 8, 23, 74, 26, 976, 377, 4822, 58769, 234044, 9349, 3732370,... 5 96 5n + 8 3n + 3 2n + 3 8 n 5 32, 3, 8, 23, 75, 277, 32, 4977, 22979, 0945, 53456, 26093, 297683,... A099265 6 600 6n + 36 4n + 2 3n + 3 8 2n + 30 n 439 900, 3, 8, 23, 75, 278, 54, 5265, 2593, 3522, 736704, 43983, 23767895,... A099266 7 4320 7n + 92 5n + 54 4n + 3 32 3n + 30 2n + 53 44 n 3 64, 3, 8, 23, 75, 278, 55, 5294, 26404, 4583, 807062, 4837585, 308073,... 8 35280 8n + 200 6n + 288 5n + 48 4n + 20 3n + 53 44 2n + 03 280 n 57023 7600, 3, 8, 23, 75, 278, 55, 5295, 2644, 42370, 89729, 5009279, 32252379,... Table 2: Density functions of L 0 c, for c =..8. 9
Proof. lim n n ρ Nc {0} n = lim n cn+ cc nc c j c c!c n+ + c n+ c 2 + lim k n+ kc c k j n k k!c n+ = c lim c! n lim c n+ n c n+ c j nc + lim n c n+ = c!. c 2 + c c k j k k! lim n k c n c c n c n+ 5 Conclusion In this note we presented a family of regular languages representing finite set partitions and studied their densities. Although it is well-known that the number of partitions of a set of n elements into no more than c nonempty sets is given by partial sums of Stirling numbers of second kind, we determined explicit formulas for their closed forms, as linear combinations of k n, for k N c. We also determined the limit frequency of those languages, which gives an estimate of the space saved with those representations. Acknowledgements We thank Miguel Filgueiras and the referee for their comments in previous versions of the manuscript. We also thank J. Shallit for pointing us a connection between the density of L c and the Bell numbers. References [BHR04] Peter Blanchard, Frank Harary, and Rogério Reis, Partitions into sum-free sets, Integers: Electronic Journal of Combinatorial Number Theory, to appear, 2004. [GKP94] Roland Graham, Donald Knuth, and Oren Patashnik. Addison-Wesley, 994. Concrete Mathematics. [Hec03] Andri Heck. Introduction to Maple. Springer, 3rd edition, 2003. http://remote.science.uva.nl/%7eheck/maplebook/. 0
[HMU00] John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 2nd edition, 2000. [RMP04] Rogério Reis, Nelma Moreira, and João Pedro Pedroso. Educated brute-force to get h4. Technical Report DCC-2004-04, DCC-FC& LIACC, Universidade do Porto, July 2004. [SF96] Robert Sedgewick and Philippe Flajolet. Analysis of Algorithms. Addison-Wesley, 996. [Slo03] N.J.A. Sloane. The On-line Encyclopedia of Integer Sequences, 2003. http://www.research.att.com/ njas/sequences. 2000 Mathematics Subject Classification: Primary 05A5; Secondary 05A8, B73, 68Q45. Keywords: Partions of sets, Stirling numbers, regular languages, enumeration Concerned with sequences A0000 A000225 A00705 A00758 A008277 A047926 A056272 A056273 A099262 A099263 A099265 A099266. Received October 4 2004; revised version received April 7 2005. Published in Journal of Integer Sequences, May 8 2005. Return to Journal of Integer Sequences home page.