SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS: MATHEMATICAL ANALYSIS

SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS: MATHEMATICAL ANALYSIS PETER J. GRABNER AND HELMUT PRODINGER Abstract. Assume that n persons communicate via a channel; to each person a certain number is attached, and the goal is to see the numbers of all persons, e. g. in order to sort them. In this paper three algorithms to achieve this are analyzed with respect to the average number of rounds. While a precise description of the algorithmic aspects can be found in the companion paper [4], we concentrate here on the mathematical aspects of the analysis. The quantities of interest can be written as certain contour integrals involving zeta functions. The numerical evaluation leads first via residues to extremely slowly converging series, and the acceleration of them is a nontrivial task that is done in a slightly more general fashion in order to fit all the applications.. Introduction The purpose of this paper is two-fold: firstly we want to discuss a probabilistic model for conflict resolution in broadcast communication and secondly we want to present a method to compute certain integrals which occur in the asymptotic analysis of this model and also elsewhere numerically to a high precision. Let us first describe the model: Assume that n persons communicate via a channel; to each person a certain number is attached, and the goal is to see the numbers of all persons, e. g. in order to sort them. We will analyze three algorithms to achieve this; the interest is in the average number of rounds. When more than one person sends at a time, a conflict arises. The conflict resolution schemes that are considered here are all based on coin flippings: The persons involved in a conflict flip a coin, and those who flipped tail step out and wait until those who flipped head have resolved their conflict. A key issue is the selection of a loser, see [5, ]; it is also known as leader election, see [,, 7]. The loser is determined by consecutive rounds of coin flippings; those who flip heads create smaller and smaller populations until a single person remains. There is a special rule to be applied however when nobody flips head; then this round has to be repeated. The loser announces his number, and then a recursive strategy is applied. The first algorithm finds the maximal number present in the file. Such an algorithm per se appeared in [, 4]. It needs several rounds of recursive coin flippings; after each Date: February 9, 4. 99 Mathematics Subject Classification. Primary: 8Q5; Secondary: 5A5, 5A, C5. This author is supported by the START-project Y9-MAT of the Austrian Science Foundation.

P. J. GRABNER AND H. PRODINGER round a successful broadcast of a number only the persons with larger numbers than the announced one go on recursively. Thus, in the process of finding the maximum, we see already several people those who broadcast successfully. They are taken out, and the remaining set of people follow the same strategy recursively, until no persons are left. See [4] for a more detailed description of the algorithm. The second algorithm does not use maxima; it is the loser who announces his number, and he therefore splits the set of persons into two subfiles larger resp. smaller numbers, and one can go on recursively with the two subfiles, until all persons have been seen. See [3] for a detailed description. Finally, we discuss a very naive strategy, where we just take the loser out, and find the next loser of the reduced set of n persons, etc. This is naturally slower. Loser selection works by coin flippings; the first two algorithms use additionally a splitting à la binary search, whereas the naive strategy only works sequentially. It turns out that the second algorithm is the best, the intuitive reason being perhaps that it is good to split the file as soon as possible, i. e. when the first loser has been found. The average number of rounds to select a loser is given by the recursion S n n = + n n k= n k Note that this algorithm itself is recursive. From [] we know the solution k= S k, n, S,S =. n n Bk S n = k k, n with Bernoulli numbers B k. There is also the very useful representation of S n as a contour integral, viz. S n = k= +i Γ sγn+ζ s Γn+ s s See [3] for some background on such integrals. The average number of rounds M n to find the maximum [, 4] is given by n [ ] B k n M n = +, n ; k k + it was also proved that M n M n = +i ΓnΓ s Γn s ds. ζ s s ds.

SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS 3 Several parameters of the probabilistic model described above can be expressed in terms of series involving the sequences M n and S n. These series turn out to converge very slowly so that they are not even suitable to compute three digits of the values in question. In the specific case here the second possibility to compute these constants is also not applicable: deform the line of integration in the contour integral representation and collect residues cf. section 5. Here the resulting series is again converging too slowly. In section we will describe how to use the integral representation to obtain a rapidly converging algorithm for numerical computation of these integrals. This algorithm is formulated rather general and could be applied to similar problems immediately. Remark. If the only task were to see all the elements, then the most obvious strategy is the conflict resolution scheme as described in [8]; everybody shouts in the beginning, and then the conflict is resolved by consecutive rounds of coin flippings. The average number of rounds for that is apart from small fluctuations n; the log constant =.885398 is better than the constants for the algorithms in this paper. However, in this way the data arrive in random order, whereas the strategies in this paper produce the sorted file very easily one can think about the data as arranged in the Quicksort tree [], from which sorting is trivial.. The maximum finding strategy Denote by T n the average number of rounds with this method. Shiau and Yang [4] argue like this: There are extra costs of M n for finding the maximum on the first level of recursion. The first successful broadcast, which can be any element with the same probability, splits the file into a subfile of n k smaller elements to which the n recursive strategy is applied at a later stage, with costs T n k. In the file of k larger elements, the search for the maximum is still continuing, but we have taken care already for the extra cost for maximum searching, thus a contribution T k M k. If k =, it costs however, and if k =, it costs, so that we have the recursion for n with T = and T =. We multiply that by n: write this with n replaced by n : and take differences: T n = M n + n T k + n T k M k + n n n, k= k= n n nt n = nm n + T k + T k M k +, k= k= n n n T n = n M n + T k + T k M k +, nt n n T n = nm n n M n +T n +T n M n, k= k= log

4 P. J. GRABNER AND H. PRODINGER or replacing n by k k + T k k Summing this from k = 3,...,n we get or T k T n n+ T 3 = n = M k M k. k + T n = 3 n++n+ n M k M k k + M k M k. k + Since asymptotics of M n are well known [4] we know a priori that T n A n, where the constant A is given by A = 3 + M k M k. k + With more effort, one could derive a full asymptotic expansion; see section for such computations. As cited already in the introduction, and therefore Define M k M k = k + M k M k = fs := Γ s n=3 +i +i ΓkΓ s Γk s ζ s ΓkΓ s k +Γk s s ds ζ s ds. s Γn n+γn s = 3s 5 s ss+ψ s, s where as usual ψs = Γ s identity was found with the help of Mathematica c. Γs As ψs is given by the uniformly convergent expansion ψs = n= n+s it is easy to see that ψ has double poles at the negative integers. For further details on ψ we refer to [9, 5]. Furthermore, we note that cf. [] for arg s < π ψ s = s + s s 3 + 3s 5 +O s.

Using the identity SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS 5 ψ s+ψ s = π sin πs + s we can find an estimate valid for all s with n N : s n > ε: fs = O s +. 3 ε e π Is Our goal is to compute the integral +i ζ s fs s ds. Expanding the ζ function and the geometric series it is k k +i j k sds. fs j j For the evaluation of the integral we need now +i for x fsx s ds = 3 x+x+x + xlogx for < x <. 3 x x 3 This equation is achieved as follows this is actually a standard technique in analytic number theory: to obtain the formula for x consider the integral over the segment { +it R t R} and the arc {σ +it σ +t = R,σ < }. Since the integrand has no poles in this region, the integral vanishes; now let R tend to and observe that the estimate 3 implies that the integral over the arc tends to. For x < we shift the line of integration to the right technically, we again truncate the integral at t = ±R and show that the contribution tends to for R and collect residues: +i fsx s ds = K k= { Res } fsx s + s=k K+ +i fsx s ds. K+ By 3 the integral is bounded by Ox K, which allows the limit K. Computing the residues and summing up yields the desired result. Using 4 we obtain k k +i j k sds fs = j j j k< j j k Φ k j 4

P. J. GRABNER AND H. PRODINGER with Φt := 3 t+t+t + tlogt 3 t t 3. Observe that Φ =, so that we can add the term k = j to the sum. In section we will explain how to compute such sums numerically to a high precision. Therefore we get A = 3 +.75579 = 3.798. Note that the paper [4] provides the elementary bounds 7 = 3.5 A 3 = 3.833. 3. Splitting the file using the loser It is not hard to find a recursion for T n, the average number of rounds for this method. We use the notation T n again, but there is no chance for confusion. The values T = and T = are self-explanatory, and for n we have T n = +S n + n T k +T n k ; n k= the counts the initial broadcast, causing the conflict, which needs S n to be resolved. Then, the probability is uniformly that the file splits. See [3] for a more elaborate n description. The traditional method to solve that is as in the previous section: replace n by n and subtract to get n nt n = n+ns n + T k ; k= n n T n = n +n S n + k= nt n n+t n = +ns n n S n. Divide this by nn+: T n n+ T n n = n n+ + S n n+ n nn+ S n and sum this from n = 3,...,N, but write again n for N: T n n+ 5 3 = 3 n n+ + S k n k + + or T n n+ = n+ S n n+ + 3 + n T k k k + S k n k + S k S k k +

from which we conclude that SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS 7 T n = 8 3 n+ 5 3 S n +n+ n S k S k. k + In [3] this is slightly wrong. We know again a priori that T n A n. Our aim is compute this constant A : A = 8 3 + From the integral formula for S k we conclude that S k S k = = +i +i Γ sγk +ζ s Γk + s s sγ sγkζ s Γk + s s ds. S k S k. 5 k + ds+ +i Define fs = Γ s Γk k +Γk + s, k 3 which makes sense for Rs <. Mathematica c found the identity fs = s 5s+4 s s s +sψ s, Γ sγkζ s Γk s s which provides the analytic continuation of fs notice that this identity is equivalent to. Then ds Note that Therefore +i A = 8 3 + sfsζ s s +i s = k sfsζ s s ks. +i ds. ds = sfsζ s k ks ds.

8 P. J. GRABNER AND H. PRODINGER The integral +i sfsζ s ks ds can be computed by shifting the line of integration to Rs = again this is justified by the growth rate of the integrand and expanding the ζ function: sfsn s ks ds. n= +i Thus it remains to compute the function +i hx = sfsx s ds. This is done in a similar way as above and gives x7 4x+3x 4x 3 x+x logx for < x hx = x x 3 for x >. With this notation, the constant A evaluates to A = 8 3 + k k Ψ = 3.545578373885, j j k< j where we write Ψt = t7 4t+3t 4t 3 t t+t t 3 logt; see the last section for such evaluations. Notealsothatthepaper[3]givestheboundsaftercorrectingthesimpleerror3.33 A 4. 4. The naive algorithm The average cost of selecting one loser is given by c n = + S n with the quantities S n from the introduction. Thus we have n n Bk c n = k k. In total, we need to evaluate k= C n = c + +c n.

We know SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS 9 n k= n Bk k k = since c = and c = 3 this leads to +i Γn+Γ s ζ s Γn+ s s ds; C n = n n +i Γk +Γ s ζ s Γk + s s ds. As before we sum up the terms in the integral n Γk +Γ s = Γn+Γ s Γk + s s+γn+ s s s s+ and obtain Now we use cf. [] to obtain C n = n+ +i +i s s s+ Γn+Γ s s+γn+ s ζ s s ds ζ s s ds. Γn+ Γn+ s = ns+ s 3s+ s 4 +O n n C n = n+ +i +i Γ s +i + Γ s s s s+ ζ s s ns+ ds ζ s s ds s 3s+ ζ s s ns ds+on. 7 8

P. J. GRABNER AND H. PRODINGER We shift the line of integration to Rs = the poles on the line Rs = and obtain by calculating the residues at C n = n+ +i +i s s s+ + Γ s ζ s s ds +i Γ s ζ s s ns+ ds s 3s+ ζ s s ns ds+nlog n n+nflog n + 3 log n+ log 3 4 +Glog n+on, where F and G denote two periodic continuous functions of period and mean given by their Fourier-expansions χ k = kπi log Fx = log Gx = log k Z\{} k Z\{} It remains to compute the integral This is done as follows: I = +i Γ χ k ζ χ k e kπix +χ k 3 χ k Γ χ k ζ χ k e kπix. s s s+ ζ s s ds. 9 I = = = k= k= k= n= +i s s s+ ζ s ks ds +i s s s+ ζ s ks ds 3 +i s s s+ ns ks ds 3.

SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS As before we compute the integral We obtain +i s s s+ xs ds = k { 3x x for < x x for x >. n n I = 3 + k n k k n 3. k= n= n= k + After summing up the first sum and some obvious cancellation we arrive at I = k n. k= n= k This can be computed by the methods outlined in section ; the numerical value is given by I =.55335885955733883984434893839495... This gives with C n = nlog n+ 3 n+nflog n+ 3 log n+b +Glog n+on B = log 3 +I =.438954934738558483858747753937479.... 4 5. Mellin Integrals One possible way to give a reasonably convergent series expansion for the value of an integral of the form +i ζ s fs ds s would be to move the line of integration to the left. This yields +i fs ζ s s ds = { Res s=χ k k Z +i fs ζ s } + fs s ζ s s ds, where again χ k = kπi. Under the growth condition fs = log O s the last integral is zero by the same arguments as used above. Thus we have a series expansion for the integral log k Z\{} { fχ k ζ χ k +Res s= fs ζ s s }.

P. J. GRABNER AND H. PRODINGER Unfortunately, the rate of convergence of this sum is not better than the rate of convergence of the sums and 5. Thus this method does not yield a numerically feasible method to compute integrals of the form. The next section will describe how to find high precision approximations to the numerical value of such integrals.. How to compute the integrals numerically In this section we discuss how to compute a sums like S = j k k Φ and S j = k k Φ j j k< j j k< j numerically. We assume that Φ C m,], Φ = D, and Φt D t = m l= C l t l logt+q m t+ot m+ logt, 3 where Q m C m+ [,] and C l are constants these two properties are satisfied for the functions to which these studies will be applied later. In the sequel we will use the Bernoulli-polynomials P k t defined by P l t zl l! = zezt e z l= and the Bernoulli numbers B l = P l. Throughout this section we use the notation {x} to indicate the fractional part of x. Then we have k j k f k j = = j k= j k= k f D k j +D j [ j j = k + k= m l= k f D k j jlog+γ + j+ m k f D j j j C l k= j +D l= m l= B l l k j llog k j +D where R j m+ P m+ m+j. k= k + m+j lj k ] llog k C l j j jlog+γ + j+ m P m+ { j x}x m dx l= B l l +R lj j

SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS 3 Weset gt = g m t = ft D m t l= C lt l logt; under ourassumptions onf, thefunction g is m times differentiable on [,] and the m+ st derivative is in L [,]. Then we have j j k= = k g = j + gtdt+ g g j + m+! m+j gtdt+ g g j + with R j m+! P m+ g m+. Finally, we compute the sums k= m l= B l g l g l lj l! P m+ { j x}g m+ xdx 4 j j m l= k= B l g l g l lj +R j l! k j rlog k j 5 for r N using the Euler-MacLaurin summation formula. For this purpose we first compute the sum j s+j k s = s+ + + m B l s lj + js ζ s +j l l m+j s m+ here we have used the identity cf. [] ζ s = s+ + m l= B l l s l l= + which is valid for Rs < m. Differentiating and setting s = r N yields j j k= k j rlog k j = r + + m l= B l l P m+ { j x}x s m dx; s m+ d ds s P m+ {x}x s m dx, l s=r lj +j jr ζ rlog jr ζ r 7 m+j d s P m+ { j x}x r m dx; ds m+ s=r

4 P. J. GRABNER AND H. PRODINGER here we have used that r m+ = for r < m. We note also that ζ r = B r+ for r r+ and ζ =. Finally, we observe that d s = ds l s=r rr!l r! l! l r l r t Inserting this into 7 and estimating the integral trivially yields j j k= k j rlog k j = r + + t= r+ l= for r l for r > l. l B l r l l t= +j jr ζ rlog jr ζ r r m l= r+3 r t lj r!l r! B l l! l lj +R j,r with R j,r r!m r! m+! m+j P m+. Extending the last sum 8 to m+ for r = m and putting everything together we obtain the C j j log term in the fourth line is to compensate the difference between ζ and B with j k= k k f = D jlog+γ + j m j+ ft D t C j j log+ dt+ g g j + m r= r C r l= jr+ ζ r r m l= r+ l= m l= l B l r l l t= B l l lj +R j + r!l r! B l l! l lj 8 B l g l g l lj + 9 l! r t lj +j jr+b r+ r + log + R j R j m+j D P m+ m+ + m+! gm+ + m +C m ζ m m+j + B m+ jm+ + P m+3 m+ jm+3 r= r!m r! C r m+! ;

here we have used that SORTING ALGORITHMS FOR BROADCAST COMMUNICATIONS 5 For the function ft D t dt = gtdt m r= ft = Φt = 3 t+t+t 3 t C r r +. + tlogt t 3 we have D = and C r = r +r +, and ft t dt =. For a numerical approximation of we split summation at J and replace the infinite part of the sum by the asymptotic estimate 9. For J = and m = this gives an error estimate of.5 45. Thus we have S =.75579745953784398743384487784544... In a similar way we can treat the sum In this case D = and of.75 48. Thus we have S = j k< j k Ψ k j Ψx dx =. For J = 8 and m = we obtain an error estimate x S =.439455733389354579379493734853... 7. Concluding remarks The method for the numerical computation of Mellin integrals as described in section can be easily generalized to integrals of the form c+i c i fs ζ s A s ds for A >, where fs is a meromorphic function in the whole complex plane satisfying the following properties: fs = O s ε for some positive ε and arg s < π fs has no poles left of a line Rs = σ the sum over the residues Resfsx s converges for < x <.

P. J. GRABNER AND H. PRODINGER If A is not an integer one has to use a slightly changed version of the Euler-MacLaurin summation formula: k = A n gxdx A n g+{an } g+ k<a n g m l= B l g l g l A l n + l! m+! P m+ {A n x}g m+ xdx. Again singularities of g have to be subtracted as shown in section. References [] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, 95. [] J.A. Fill, H. M. Mahmoud, and W. Szpankowski.Onthe distribution forthe durationofarandomized leader election algorithm. Ann. Appl. Probab., : 83, 99. [3] P. Flajolet and R. Sedgewick. Mellin transforms and asymptotics: finite differences and Rice s integrals. Theor. Comput. Sci., 44 : 4, June 995. [4] P. Grabner and H. Prodinger. An asymptotic study of a recursion occurring in the analysis of an algorithm on broadcast communication. Inf. Process. Lett., 5:89 93, 998. [5] P. J. Grabner. Searching for losers. Random Structures and Algorithms, 4:99, 993. [] A. Ivić. The Riemann Zeta-Function. J. Wiley, New York, 985. [7] S. Janson and W. Szpankowski. Analysis of an asymmetric leader election algorithm. Electron. J. Comb., 4:#R7, 7 pages, 997. [8] P. Mathys and P. Flajolet. Q ary collision resolution algorithms in random access systems with free or blocked channel access. IEEE Transactions on Information Theory, IT-3:7 43, March 985. [9] N. Nielsen. Handbuch der Theorie der Gammafunktion. Teubner, Leipzig, 9. [] H. Prodinger. How to select a loser. Discrete Math., :49 59, 993. [] R. Sedgewick and P. Flajolet. Introduction to the Analysis of Algorithms. Addison-Wesley, Reading, Massachusetts, 99. [] S.-H. Shiau and C.-B. Yang. A fast maximum finding algorithm on broadcast communications. Inf. Process. Lett., :8 89, 99. [3] S.-H. Shiau and C.-B. Yang. A fast sorting algorithm and its generalization on broadcast communications. preprint, 8 pages,. [4] S.-H. Shiau and C.-B. Yang. A fast sorting algorithm on broadcast communications. Algorithmica,?: pages,. to appear. [5] E. C. Titchmarsh. The Theory of Functions. Oxford University Press, 939. P. Grabner Institut für Mathematik A, Technische Universität Graz, Steyrergasse 3, 8 Graz, Austria E-mail address: grabner@weyl.math.tu-graz.ac.at H. Prodinger The John Knopfmacher Centre for Applicable Analysis and Number Theory, Department of Mathematics, University of the Witwatersrand, P. O. Wits, 5 Johannesburg, South Africa E-mail address: helmut@gauss.cam.wits.ac.za