COLLECTED WORKS OF PHILIPPE FLAJOLET

Size: px

Start display at page:

Download "COLLECTED WORKS OF PHILIPPE FLAJOLET"

Garry Boyd
6 years ago
Views:

1 COLLECTED WORKS OF PHILIPPE FLAJOLET Editorial Committee: HSIEN-KUEI HWANG Institute of Statistical Science Academia Sinica Taipei 115 Taiwan ROBERT SEDGEWICK Department of Computer Science Princeton University Princeton, NJ USA WOJCIECH SZPANKOWSKI Department of Computer Science Purdue University West Lafayette, Indiana USA BRUNO SALVY Algorithms Project INRIA Rocquencourt F Le Chesnay France MICHÈLE SORIA Laboratoire d Informatique Université Pierre et Marie Curie F Paris cedex 05 France BRIGITTE VALLÉE Département d Informatique Université de Caen F Caen Cedex France MARK DANIEL WARD (General Editor) Department of Statistics Purdue University West Lafayette, Indiana USA ISBN TBA Cambridge University Press (print version) TBA (e-version)

2 COLLECTED WORKS OF PHILIPPE FLAJOLET There will be several types of introductions, including an introduction to the entire series of books (written by Donald E. Knuth), and also introductions to each specific volume (written the editors of that volume).

3 Contents Chapter I. STRING ALGORITHMS 1 Introduction 1. TEXT ANALYSIS 3 Paper 2. PAPER74 9 Paper 3. PAPER Paper 4. PAPER Chapter II. INFORMATION THEORY 15 ANALYTIC INFORMATION THEORY 17 Analytic Information Theory 17 Preliminary Discussion 18 Minimax Redundancy for a Class of Sources 20 Minimax Redundancy for Memoryless Sources 21 Minimax Redundancy for Renewal Sources 23 Paper 5. PAPER Paper 6. PAPER Paper 7. PAPER SEMINAR 31 Paper 8. PAPER SEMINAR 33 Chapter III. DIGITAL TREES 35 THE DIGITAL TREE PROCESS A central role in computer science Digital trees in Philippe Flajolet s works Conclusion 43 Paper 9. PAPER Chapter IV. MELLIN TRANSFORM 47 DR FLAJOLET S ELIXIR OR MELLIN TRANSFORM AND ASYMPTOTICS 49 Mellin transform and fundamental strip 49 Symbolic analysis 50 iii

4 iv CONTENTS Fundamental result 51 Harmonic sums 52 Zigzag method 52 Average-case analysis of algorithms and harmonic sums 53 Exponentials in harmonic sums 54 Technical point 55 Oscillations 56 Related topics 57 Paper 10. PAPER Chapter V. DIVIDE AND CONQUER 61 DIVIDE-AND-CONQUER RECURRENCES AND THE MELLIN-PERRON FORMULA Introduction The basic technique Concluding Remarks 69 Paper 11. PAPER Chapter VI. COMMUNICATION PROTOCOLS 73 FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS Introduction Telecommunication Protocols, Aloha protocol The tree collision resolution algorithm The free access tree algorithm Q-ary free access tree algorithm 87 BIBLIOGRAPHY 89 Paper 12. PAPER BIBLIOGRAPHY 93 INDEX 103

5 Chapter I STRING ALGORITHMS

7 INTRODUCTION 1 Text Analysis Pierre Nicodème List of articles. (#74)[66] Deviations from Uniformity in Random Strings (1988), P. Flajolet, P. Kirschenhofer and R.F. Tichy (#76)[67] Discrepancy of Sequences in Discrete Spaces (1989), P. Flajolet, P. Kirschenhofer and R.F. Tichy (#151,#174)[166, 167] Motif statistics (1999)-(2002), P. Nicodème, B. Salvy and P. Flajolet (#164)[63] Hidden Pattern Statistics (2001), P. Flajolet, Y. Guivarc h, W. Szpankowski and B. Vallée. (#191)[121] Hidden Word Statistics, (2006), P. Flajolet, W. Szpankowski and B. Vallée. Since the computing capability of computers developed in the sixties and the seventies, text analysis has been a field of subject either for searching tools that find positions of matches with a motif in a specific text or for counting occurrences of motifs in random texts by combinatorial or probabilistic methods. Counting methods and statistics often provide limit laws, under various probability source models for the texts, which allows the detection of exceptional behaviours. As a typical object of computer science, finite automata have been used both for searching and for statistical analysis. Many statistical questions about word statistics have been solved by three different methods, combinatorial analysis, automata, and probability analysis. Such statistical researches imply the consideration of two objects, the source under which the text is generated, and the type of motif considered; the latter may be a single word or a finite set of words, reduced if no word is factor of another word of the set, or not reduced in the contrary and more difficult case, or an infinite set as defined by a regular expression with stars, or a hidden word. Deviations from Uniformity in Random Strings. The important contributions of Philippe Flajolet in text analysis have to be situated historically with respect to the previously mentionned developments; his work however goes also upon searching intrinsic properties of texts. In the article [66] Deviations from Uniformity in Random Strings (1988) 1, coauthored with P. Kirschenhofer and R.F. Tichy, he goes from (intrinsic) properties of normality of infinite strings, a problem set up by E. Borel in The 1989 published article Discrepancy of Sequences in Discrete Spaces, although published later of the 1988 article, is obviously a preliminary and unaccomplished version of the 1988 article; we will therefore not discuss it. 3

8 4 1. TEXT ANALYSIS during his researches on measure theory, to the speed of convergence to uniformity for large sequences, a computer science problem. Normal numbers (E. Borel, 1908) are numbers such that any block of bits of a given size occurs with its natural probability (1/2 k for blocks of length k) in their infinite binary representation. P. Flajolet and his coauthors [66] cope with the asymptotic number of occurrences of blocks when a random binary sequence built upon a uniform Bernoulli source is large, but not infinite, a totally unexplored subject by the time. They build first a de Bruijn graph [25] counting simultaneously the occurrences of all words and deduce from it a universal Markov chain; the latter posseses strong convergence properties when the size k of the blocks remains fixed while the size n of the sequences tends to infinity, but these properties do not allow to conclude when k tends to infinity and approaches log 2 (n). Next comes an analysis based on words counting where Philippe Flajolet s influence is clear; the proofs are based on combinatorics of words à la Guibas-Odlyzko, very delicate asymptotic manipulations and a saddle-point like integral. Guibas-Odlyzko [130, 131] (1981) introduced the autocorrelation polynomial of a word, the correlation polynomial of two words, and the language parsing of a sequence with respect to occurrences of a pattern. A key lemma of the proof of the Deviations from Uniformity article [66] extends a result of Guibas and Odlyzko [129] (1978) and indicates that the relevant counting generating function has no poles inside a circle of integration z = 1 + ɛ for a suitable small ɛ. It is worth noting that the results of this article are optimal, proving that all words of size (1 ɛ) log 2 (n) occurs with probability one in a binary random sequence of length n. Considering Proposition IV.4 p in Flajolet and Sedgewick book [114] and using bootstrapping as in Fayolle [40] (2004) should open the way to a generalization of the result to alphabets of any size. As a consequence, the fill-up level of a suffix-tree built from an unbiased source upon a sequence of length n is likely to be (1 ɛ) log α (n) for an alphabet of cardinality α. Future work could study more general sources; in particular the study of a general notion of discrepancy for biased sources should be compared with Knessl and Szpankowski study [148] (2004) of the fill-up level in tries generated by an biased binary source, the analysis of the fill-up level of a suffix-tree remaining also an open problem. Motif Statistics. The ( ) articles [166, 167] entitled Motif statistics, coauthored with P. Nicodème and B. Salvy, build upon important previous developments of theoretical computer science. It is worth recalling some corner stones of automata theory, a major topic in this article coping on regular expressions. Kleene [147] (1956) and Rabin and Scott [171] (1959) provided constructions of DFA for regular expressions. Aho and Corasick devised an efficient algorithm [2] (1975) to construct an automata for searching finite set of words (1975) while Knuth, Morris and Pratt [149] (1977) gave a fast algorithm that is also realized by an automaton and searches for occurrences of a single word. In a fundamental article about context-free languages, Chomsky and Schützenberger [17] (1963) gave an algorithm computing the generating function of words recognized by a Deterministic Finite Automaton on a finite alphabet. This generating function is always solution of a system of linear equations, the homogeneous part of which having coefficients that are monomials of degree one

9 1. TEXT ANALYSIS 5 with respect to the alphabet; it follows that the resulting generating functions (those of regular languages by the classical automata constructions [147, 171] previously mentioned) are rational. Note that P. Flajolet and coauthors results have a wide generality, providing an algorithmic construction for regular patterns, a class which contains all finite patterns. From the resulting bivariate generating function follows computation of the moments and access to the normal limit law. The automata constructions however hide the structural properties of finite patterns. These are mostly provided by language analysis and once again we have to mention the pioneering work of Guibas and Odlyzko [130, 131] (1981); they followed the idea of parsing a text with respect to the occurrences of the pattern, and defined the languages Right of words finishing with the first occurrence of a word of the pattern, Minimal of words separating two occurrences and Ultimate of words following the last occurrence. Guibas and Odlyzko provided the generating functions of these languages by recurrence; later, Régnier and Szpankowski [175, 176] (1997,1998) and Régnier [172] (2000) provided a set of formal equations for these languages and proved for single word patterns Gaussian or Poisson limits, depending on number of occurrences being Θ(n) or O(1). Régnier and Denise [173, 174] (2003, 2004) obtained large deviation results for occurrences of one word conditioned by the number of observed occurrences of another word. Quite contemporary with Guibas-Odlyzko seminal work, Goulden and Jackson [127, 128] (1979,1983) devised a very powerful analytic inclusion-exclusion method that provides multivariate counting for a reduced set of words. This was later generalized to general sets of words by Noonan and Zeilberger [168] and the corresponding complete proofs have been done by Bassino et al. [7] (to appear). Although the work of Goulden and Jackson [127, 128] deals with multivariate counting of occurrences of words, it seems that no multivariate counting with respect to the Chomsky-Schützenberger algorithm has been used before the Motif Statistics article. There, P. Flajolet and his coauthors use the trick of enlarging the alphabet by a counting letter of size zero, and replacing each transition going to a final state by a compound transition that include this letter. This counting letter records the number of positions where a match with the regular expression is found. Then, using the Chomsky-Schützenberger algorithm provides a bivariate generating function for the lengths of the texts and the number of matching positions. Later, Nicodème [165] (2003) replaced the compound transitions by marked states, where a marked state corresponds to a matching position and there can be several types of marked states for multivariate counting of several regular expressions; each marked state emits next the corresponding counting variable during the processing of the Chomsky-Schützenberger algorithm. These marked states have the same algorithmic properties as final states; in particular, it is easy to generalize the determinization and minimization algorithms for automata with marked states or to apply to them the construction providing a Markov automaton. An important result of Motif Statistics is the asymptotic normal law for the number of matching positions of a regular expression in a random text. P. Flajolet and his coauthors provided an algorithm to cope with a Markovian source of any order, and therefore Motif Statistics generalizes previous work of Régnier and Szpankowski [175, 176] who proved a Gaussian limit law in Markovian models for one

10 6 1. TEXT ANALYSIS word. It does not subsume however Bender and Kochman work [9] since they considered simultaneous counts of matches with patterns that are finite set of words and obtained multivariate Gaussian limit laws. Since Deterministic Finite Automata are representable by positive matrices, P. Flajolet and his coauthors use the Perron-Frobenius theorem asserting that positive matrices have a unique dominant real positive eigenvalue; this eigenvalue is a function of the formal variable used for the counting of the matching positions, variable that is assumed to be positive. Then follows a uniform separation property of the poles of the rational bivariate generating function counting the text lengths and the matches and an application of the Cauchy theorem provides a formula suitable for application of Hwang s Quasi-Power-Theorem [135, 136] (1996, 1998); note that Heuberger extension of this theorem [133] (2007) to two dimensions should apply to simultaneous statistics of two regular motifs while no theory is presently available for statistics of higher dimensions. The variability condition of Hwang s theorem is asserted valid by a delicate property of log-convexity of the dominant eigenvalue. Another algorithmic trick of the Motif Statistics article reduces the computation of asymptotic moments to the solution of a few linear systems with constants entries instead of one linear system with polynomials entries. Bourdon and Vallée [11] (2006) generalized the analysis of motif statistics to the case of dynamical sources. We quote here that a purely probabilistic approach to word counting has been devised by several authors. We refer for bibliographic entries to Chapter 6 Statistics on Words with Applications to Biological Sequences of Lothaire book [157] (2005) Applied Combinatorics on Words that is devoted to this approach. This probabilistic method uses in particular the Chen-Stein method [182, 14] (1972,1975) and Poisson and compound Poisson approximations [15] (2004). The algorithms present in the Motif Statistics article have been programmed in the Maple package regexpcount available within the algolib library 2 and applied to biological applications; indeed, the protein motifs of the database PROSITE 3 correspond to regular expressions generating finite languages; these languages however can have enormous size while the automatons recognizing them have moderate sizes that make them physically computable. As an experimental conclusion, P. Flajolet and his coauthors observe that the theoretical predictions of number of occurrences of protein motifs tend to underestimate systematically the observations. Hidden Pattern Statistics. This (2001) article coauthored with Y. Guivarc h, W. Szpankowski and B. Vallée considers a model typically different of the one of Motif Statistics where the number of matching positions in a sequence of length n is upper bounded by n. By instance, the hidden pattern a# a occurs n(n 1)/2 times in the sequence a n ; all pairs of positions with an occurrence of the letter a are counted. This pattern is unconstrained in opposition to the pattern a# 2 a where all the pairs of positions with an a are counted, under the constraint that the gap separating these positions is less than 2 symbols. 2. This library is available at 3. See

11 1. TEXT ANALYSIS 7 P. Flajolet and his coauthors were motivated by giving some statistical threshold for detection of network intrusion, where a hidden pattern occurs repetitively with a variable number of intervening events between each occurrence. Hidden patterns can also serve as a model for the exon-intron systems in molecular biology, or help for searches in data mining. Last but not least, hidden patterns statistical analysis can put a bit of intelligence in the mysteries of secret codes and secret messages ; following links from P. Flajolet Web page, one finds that the book Moby Dick predicted in full details the accidental death of Lady Di... More seriously, the line of proof of this article is crystal clear. The dominant effect in the analysis come from the unbounded constraints # between two letters, as seen previously in the toy example a# a. When considering a generic random 1 text, each of these unbounded constraints provides a factor 1 z in the probability generating functions of the first moment of the random variable counting the number of occurrences. There are also two unbounded constraints which do not belong intrinsically to the pattern, namely corresponding to the set of words preceding the occurrence of the pattern and following this occurrence; these two constraints provide 1 a factor (1 z). This implies that a hidden pattern with b 1 unbounded constraints 2 # will lead to an expectation of number of occurrences that is Θ ( n b). The analysis of the second moment implies considering two simultaneous occurrences and possible intersections of blocks of positions, where a block is a set of positions that corresponds to a maximal subpattern without unbounded constraint of the hidden pattern. One occurrence of the pattern a# 2 b# aa# b will lead to three blocks of positions and two simultaneous occurrences will lead to (i) three hitting blocks, (ii) two hitting blocks and two non-hitting blocks, (iii) one hitting block and four non-hitting blocks and (iv) 1 six non-hitting blocks, each contributing again by a factor 1 z to the generating function of the second moment. Then, and the proof is as simple as beautiful, the less the number of intersecting blocks of two simultaneous occurrences, the highest the number of blocks and the highest asymptotic contribution to the second moment, namely Θ(n 2b 1 ) for a hidden pattern with b 1 unbounded constraints. The second moment is explicitely provided as a function of a generalized correlation of the pattern that extends the usual notion of words correlation. The k-th moment is also computed by considering k simultaneous occurrences, the minimum number of intersecting blocks corresponding once more to the maximum freedom degree and to asymptotically dominant terms; using some bijections and properties of involutions provide the authors with a proof by convergence of moments to an asymptotic Gaussian law. Hidden Word Statistics. This (2006) article coauthored with W. Szpankowski and B. Vallée gives important extensions to the conference article Hidden Pattern Statistics. A delicate dynamical programming procedure computes the variance of number of occurrences in polynomial time with respect to the size of the pattern specification. The limit distribution for fully constrained patterns is analysed from a different point of view which leads to results that cannot be reached by convergence of moments. A de Bruijn graph [25] (1946) is used to embed the count of occurrences in the spirit of transfer matrices; then, Perron-Frobenius dominant properties and the Quasi-Power- Theorem of Hwang [134, 135] (1994, 1996) are used in a vein similar to their use in

12 8 1. TEXT ANALYSIS the article Motif Statistics. P. Flajolet and his coauthors push further the analysis of the dominant real eigenvalue of the system by considering the cycles of the de Bruijn graph to obtain a Central Limit Law, Large Deviations bounds, and, for primitive patterns, a Local Limit Theorem. P. Flajolet loved emphasizing his engineering skill, and the Hidden Word Statistics article presents an unexpected counting wheel machine that is a fruit of this skill. For the same reason, he wanted to apply his theoretical results; in the present case, he and his coauthors performed a comparison of observed counts of a hidden word or sentence in Shakespeare s Hamlet with the expected corresponding counts computed under the same distribution of letters. The concluding section of the article provides an estimator for detection of true alarms, which was the original motivation of the article. Bourdon and Vallée [10] (2002) provide an analysis of hidden pattern statistics under a dynamical source model, compute the two first moments of the count and prove concentration in distribution to the expectation; they leave however pending the proof of a limiting Gaussian law.

13 PAPER 2 Paper74 9

15 PAPER 3 Paper 76 11

17 PAPER 4 Paper

19 Chapter II INFORMATION THEORY

21 Analytic Information Theory Wojciech Szpankowski Philippe Flajolet had long term research interests in algorithms that are at the heart of virtually all computing technologies and combinatorics that provide indispensable tools for finding patterns and structures. However, we shall argue that from early days Flajolet was fascinated with information, that permeats every corner of our lives and shapes our universe. In fact, he was the midwife and active participant of analytic information theory that combines analytic combinatorics and information theory [39, 163, 37, 81, 119, 145, 185, 146]. In this chapter we concentrate on information theory of data compression and present Flajolet s work in this area [119, 145, 185, 146], while in other chapters of this volume we touch upon other information-theoretic work of Flajolet [39, 163, 37, 81]. Analytic Information Theory Jacob Ziv in his 1997 Shannon Lecture presented compelling arguments for backing off from first-order asymptotics in order to predict the behavior of real systems with finite length description. To overcome these difficulties, the so called nonasymptotic analysis, in which lower and upper bounds are established with controllable error terms, becomes quite popular. However, we argue that developing full asymptotic expansions and more precise analysis may be even more desirable. Furthermore, following Hadamard s precept 1, Flajolet and others proposed to study information theory problems using techniques of complex analysis 2 such as generating functions, combinatorial calculus, Rice s formula, Mellin transform, Fourier series, sequences distributed modulo 1, saddle point methods, analytic poissonization and depoissonization, and singularity analysis [188]. This program, which applies complex-analytic tools to information theory, constitutes analytic information theory. Analytic information theory can claim some successes in the last decade. We mention a few: proving in the negative the Wyner-Ziv conjecture regarding the longest match [183, 184]; establishing Ziv s conjecture regarding the distribution of the number of phrases in the LZ 78 compression scheme [139]; showing the right order of 1. The shortest path between two truths on the real line passes through the complex plane. 2. Andrew Odlyzko wrote: Analytic methods are extremely powerful and when they apply, they often yield estimates of unparalleled precision. 17

22 18 ANALYTIC INFORMATION THEORY the LZ 78 redundancy [178, 158]; disproving the Steinberg-Gutman conjecture regarding lossy pattern matching compression schemes [159, 196, 153]; establishing precise redundancy of Huffman s code [187] and redundancy of a fixed-to-variable no prefix free code [189]; deriving precise asymptotics of minimax redundancy for memoryless sources [195, 186, 190], Markov sources [177, 141] and renewal sources [120, 33]; precise analysis of variable-to-fixed codes such as Tunstall and Khodak codes [32]; designing and analyzing error resilient Lemple-Ziv 77 data compression scheme [155], and finally establishing entropy of hidden Markov processes [137] and the noisy constrained capacity [132, 142]. In this chapter, we only present Flajolet s work on data compression, in particular, on the minimax redundancy for renewal sources [120]. We start with some preliminary discussion. Preliminary Discussion Let us start with some definitions and preliminary results. A source code is a bijective mapping C : A {0, 1} from the set of all sequences over an alphabet A to the set {0, 1} of binary sequences. We write x A for a sequence of unspecified length, and x j i = x i... x j A j i+1 for a sequence of length j i + 1. We denote by P the probability of the source, and write L(C, x) (or simply L(x)) for the code length of the source sequence x over the code C. Finally, the source entropy is defined as usual by H(P ) = x A P (x) log P (x) and the entropy rate is denoted by h. We often present our results for the binary alphabet A = {0, 1}. Our interest lies in the so called prefix codes for which no codeword is a prefix of another codeword. For such codes there is a mapping between a prefix code and a path in a tree from the root to a terminal (external) node (e.g., for a binary prefix code move to the left in the tree represents 0 and move to the right represents 1). We also point out that a prefix code and the corresponding path in a tree defines a lattice path in the first quadrant [32]. If some additional constraints are imposed on the prefix codes, this translates into certain restrictions on the lattice path [32]. The prefix condition imposes some restrictions on the code length. This fact is knows as Kraft s inequality discussed next. Theorem 4.1 (Kraft s Inequality). Let A = 2. For any prefix code the codeword lengths l 1, l 2,..., l N satisfy the inequality N (1) 2 li 1. i=1 Conversely, if codeword lengths satisfy this inequality, then one can build a prefix code. Proof. This is an easy exercise on trees. Let l max be the maximum codeword length. Observe that at level l max some nodes are codewords, some are descendants of codewords, and some are neither. Since the number of descendants at level l max of a

23 PRELIMINARY DISCUSSION 19 codeword located at level l i is 2 lmax li, we obtain N 2 lmax i 2 lmax, i=1 which is the desired inequality. The converse part can also be proved, and is left for the reader. Kraft s inequality begs for an interesting combinatorial question. How many tuples (l1 1,..., l N ) are there such that the equality in (1) holds, that is, N i=1 2 li = 1. This question was answered in [81]. Observe that the Kraft s inequality implies the existence of at least one sequence x such that L( x) log P ( x). Actually, a stronger statement is due to Barron [6] who proved the following result. Lemma 4.1 (Barron). Let L(X) be the length of a prefix code, where X is generated by a stationary ergodic source over a binary alphabet. For any sequence a n of positive constants satisfying n < the following holds 2 an and therefore Proof: We argue as follows: Pr(L(X) < log P (X) a n ) 2 an, L(X) log P (X) a n Pr(L(X) < log 2 P (X) a n ) = (almost surely). P (x) x:p (x)<2 L(x) an x:p (x)<2 L(x) an 2 L(x) an 2 an 2 L(x) 2 an. The lemma follows from the Kraft inequality for binary alphabets and the Borel- Cantelli Lemma. Using Kraft s inequality we can now prove the first theorem of Shannon that bounds from below the average code length. Theorem 4.2. For any prefix code the average code length E[L(C, X)] cannot be smaller than the entropy of the source H(P ), that is, E[L(C, X)] H(P ). where the expectation is taken with respect to the distribution P of the source sequence X. x

24 20 ANALYTIC INFORMATION THEORY Proof. Let K = x 2 L(x) 1 for a binary alphabet, and L(x) := L(C, x). Then E[L(C, X)] H(P )] = P (x)l(x) + P (x) log P (x) x A x A = x A P (x) log P (x) 2 L(x) /K log K 0 since log x x 1 for 0 < x 1 or the divergence is nonnegative, while K 1 by Kraft s inequality. What is the best code length? We are now in a position to answer this question. As long as the expected code length is concerned, one needs to solve the following constrained optimization problem for, say a binary alphabet min L L(x)P (x) x subject to 2 L(x) 1. This optimization problem has an easy solution through Lagrangian multipliers, and one finds that the optimal code length is L(x) = log P (x) provided the integer character of the length is ignored. In general, one needs to round the length to an integer, thereby incurring some cost. This cost is usually known under the name redundancy. For known distribution P, the pointwise redundancy R(x) for a code C, the average redundancy R, and maximal redundancy R aredefinedrespectivelyasto R(x)=L(C,x)+log P (x), R = E[L(C, X)] H(P )] 0, R = max x L(C, x) + log P (x)].the pointwise redundancy can be negative, but the average redundancy cannot due to the Shannon theorem. From now on we assume that the source string x is of length n, and study the maximal redundancy that we denote as Rn for a class of sources S. Such redundancy si known as the minimax redundancy discussed next. x Minimax Redundancy for a Class of Sources Let us begin with a precise information-theoretic definition of the minimax redundancy and its Shtarkov s bounds. Throughout this section, we write L(C n, x n ) for the length of a fixed-to-variable code C n ; A n {0, 1} assigned to the source sequence x n over the alphabet A = {1, 2,..., m} of size m that can be finite or not. In practice, one can only hope to have some knowledge about a family of sources S that generates the data, such as the family of memoryless sources M 0 or Markov sources M r of order r > 0. Following Davisson [24] and Shtarkov [181], we define the minimax worst-case (maximal) redundancy R n(s) for a family S as (2) R n(s) = min C n sup P S max [L(C n, x n x n 1 ) + log P (x n 1 )], 1 where C n represents a set of prefix codes, and the source P S generates the sequence x n = x 1... x n. If we ignore the integer nature of the code length L(C n, x n ), then we can approximate it by log 1/P θ for some θ. Furthermore, log sup P S P (x n ) =

25 MINIMAX REDUNDANCY FOR MEMORYLESS SOURCES 21 log(1/pˆθ), where ˆθ is the ML estimator, so that (3) R n(s) = inf θ max x n log Pˆθ P θ + O(1). We derive now Shtarkov s bound [181]. Define first the maximum likelihood distribution Q (x n sup ) := P S P (x n ) y n A sup n P S P (y n ). Then observe [33] R n(s) = min C n = min max C n x n = min max C n = R GS n (Q ) + log y n A n sup P S sup max (L(C n, x n ) + log P (x n )) P S x n ( ) L(C n, x n ) + sup log P (x n ) P S x n (L(C n, x n ) + log Q (x n )) + log P (y n ) = log y n A n sup P S y n A n sup P S P (y n ) P (y n ) + O(1) where 0 < Rn GS (Q ) 1 is the redundancy of the optimal generalized Shannon code (see [33]). Therefore, ignoring again the integer constraint (i.e., setting Rn GS (Q ) = 0) and using (3) rather than (2) we arrive at R n(s) = x n sup P θ (x n ) = inf θ Θ max Pˆθ log θ Θ x n From now on, we assume that R (S) = log D n,m (S) where (4) D n,m (S) = sup P (x n ). x n A n P S The O(1) term in (3) can be computed for finitely parameterized sources as in [33], but we will not elaborate on it here. P θ. Minimax Redundancy for Memoryless Sources As a warm-up exercise, we first consider the minimax redundancy for a class of memoryless sources over alphabet of size m. We follow here [186]. Observe that D n,m := D n,m (M 0 ) defined in (4) takes the form (5) D n,m = k 1+ +k m=n ( ) ( ) k1 n k1 k 1,..., k m n ( km n ) km, where k i is the number of times symbol i A occurs in a string of length n. Indeed, observing that P (x n ) = p k1 1 pkm m where p i are unknown parameters θ representing

26 22 ANALYTIC INFORMATION THEORY the probability for symbol i A, we proceed as follows D n (M 0 ) = sup P (x n x n P (x n 1 ) 1 ) 1 ( ) n = sup p k1 1 k 1,..., k pkm m m p k 1,...,p m 1+ +k m=n ( ) ( ) k1 ( ) km n k1 km =, k 1,..., k m n n where the last line follows from sup P (x n P (x n 1 ) 1 ) = k 1+ +k m=n sup p 1,...,p m p k1 1 pkm m = ( ) k1 k1 n ( km n ) km. We should point out that (5) has a form that re-appears in the redundancy analysis of other sources. Indeed, the summation is over tuples = (k 1,..., k m ) representing a (memoryless) type and under the sum the first term ( ) n k 1,...,k m counts the number of sequences x n of the same type while the second term is the maximum likelihood distribution. It is argued in [186] that the asymptotics of such a sum can be analyzed through its so-called tree-like generating function defined as n n D m (z) = n! D n,mz n. n=0 Here, we will follow the same methodology and employ the convolution formula for tree-like generating functions (cf. [188]). Observe that D m (z) relates to another treelike generating function defined as k k B(z) = k! zk. k=0 This function, in turn, can be shown to be (cf. [188]) B(z) = (1 T (z)) 1 for z < e 1, where T (z) = k=1 kk 1 k! z k is the well-known tree function that counts the number of rooted labeled trees on n vertices [114] satisfying the implicit equation (6) T (z) = ze T (z) with T (z) < 1. The convolution formula [188] applied to (5) yields (7) D m (z) = [B(z)] m 1. Consequently, D n,m = n! n n [z n ] [B(z)] m where [z n ]f(z) denotes the coefficient of z n in f(z). Defining β(z) = B(z/e), z < 1, noticing that [z n ]β(z) = e n [z n ]B(z), and applying Stirling s formula, (7) yields (8) D n,m = 2πn ( 1 + O(n 1 ) ) [z n ] [β(z)] m.

27 MINIMAX REDUNDANCY FOR RENEWAL SOURCES 23 Thus, it suffices to extract asymptotics of the coefficient at z n of [β(z)] m, for which a standard tool is Cauchy s coefficient formula [114, 188], that is, (9) [z n ][β(z)] m = 1 β m (z) dz 2πi zn+1 where the integration is around a closed path containing z = 0 inside which β m (z) is analytic. However, asymptotic evaluation of the above depends whether m is finite or is a function of n [186, 190]. Minimax Redundancy for Renewal Sources Let us continue our analytic extravaganza and consider non-finitely parameterized sources. We study here the so called renewal sources first introduced in 1996 by Csiszár and Shields [23]. Such a source is defined as follows: Let T 1, T 2... be a sequence of i.i.d. positive-valued random variables with distribution Q(j) = Pr{T i = j}. The process T 0, T 0 + T 1, T 0 + T 1 + T 2,... is a renewal process. In a binary renewal sequence the positions of the 1 s are at the renewal epochs T 0, T 0 + T 1,... with runs of zeros of lengths T 1 1, T 2 1,... in between the 1 s. The process starts with x 0 = 1. We follow here the analysis presented in [120]. A sequence generated by such a source becomes x n 0 = 10 α1 10 α αn }{{} k where k m is the number of i such that α i = m. Then P (x n 1 ) = [Q(0)] k0 [Q(1)] k1 [Q(n 1)] kn 1 Pr{T 1 > k }. The last term introduces some difficulties in finding the maximum likelihood distribution, but it can be proved that the minimax redundancy Rn(R 0 ) = log D n (R 0 ) of the renewal source R 0 satisfies n r n+1 1 D n (R 0 ) I(n,k) m=0 where r n = n k=0 r n,k and r n,k = ( ) ( ) k0 ( ) k1 k k0 k1 (10) k 0 k n 1 k k Above I(n, k) is is the integer partition of n into k terms, i.e., r m ( kn 1 n = 1k 0 + 2k nk n 1, k = k k n 1. P(n,k) k ) kn 1. Since r n is too difficult to analyze, we rather study s n = n k=0 s n,k where s n,k = e k k k0 k 0! kkn 1 k n 1!, r n,k = k! s n,k k k e k

28 24 ANALYTIC INFORMATION THEORY Figure 4.1. Iluustration to the Saddle Point Method since S(z, u) = s n,k (u/e) k z n = ( u ) k0+ +k n 1 k k 0 z 1k0+2k1+ +nkn 1 e k 0! kkn 1 k n 1! = β(z i u) k,n P n,k i=1 where β(z) = B(z/e) is defined in the previous section. To compare s n to r n, we introduce the random variable K n as follows Pr{K n = k} = s n,k. s n Stirling s formula yields r n n r n,k s n,k = = E[(K n )!Kn Kn e Kn ] s n s n,k s n Thus k=0 = E[ 2πK n ] + O(E[K 1 2 n ]). r n = s n E[ 2πK n ](1 + o(1)) = s n 2πE[Kn ](1 + o(1)). To understand probabilistic behavior of K n, we apply sophisticated tools of analytic combinatorics such as Mellin transform and the saddle point [114, 188]. In particular, we must evaluate [z n ]S(z, 1) by the saddle point that leads to the following ( ) c s n = [z n ]S(z, 1) = [z n ] exp 1 z + a log 1 1 z

29 MINIMAX REDUNDANCY FOR RENEWAL SOURCES 25 which is illustrated in Figure 1. We prove in [120] the following. Lemma 4.2. Let µ n = E[K n ] and σn 2 = V(K n ). Then µ n = 1 n 4 c log n c + o( n), σn 2 = O(n log n) = o(µ 2 n), where c = π 2 /6 1, d = log log c 3 4 log π. This leads to our final result proved in [120]. Theorem 4.3 (Flajolet and Szpankowski, 1998). We have the following asymptotics s n exp (2 cn 78 ) log n + O(1), that yields log r n = where c = π cn log 2 8 log n + 1 log log n + O(1). 2 Rn(R 0 ) = 2 cn + O(log n). log 2

31 PAPER 5 Paper

33 PAPER 6 Paper

35 PAPER 7 Paper Seminar 31

37 PAPER 8 Paper Seminar 33

39 Chapter III DIGITAL TREES

41 The digital tree process Julien Clément, Mark Daniel Ward The analysis of trees was integrated in Philippe Flajolet s writings and research throughout his life. In particular, he had a very keen interest in the digital tree process [116, 53], omnipresent in Computer Science, for which the trie data structure is the strongest and best known embodiment. During Philippe s invited lecture at STACS 06, which was devoted to tries, he said that the treatment cannot be but a brief guide to a rich subject whose proper development would require a book of full length [53]. 1. A central role in computer science The digital tree process is found practically anywhere that data is classified or sorted and has abundant applications. In a nutshell, this process relies on the principle of the thumb rule in dictionaries. As a data structure, the most pervasive kind of digital tree is the retrieval tree, introduced by de la Briandais [28] and Fredkin [126] 1, and usually shortened to trie. (Section 6.3 of Knuth [152] is a very helpful and fundamental discussion of the fundamentals, which traces the principles of tries to Thue [191].) A partitioning of the data items often using a sorting or classification by types takes place at the root node. The tree is built recursively, according to subsequent bits or digits of the data. The children of the root are sorted further into subtrees and are thus partitioned more finely. The data items (also known as keys or strings) eventually require no more sorting and are ultimately stored in leaves of the trie. Due to their generality, tries are one of the most widely-known and greatly-studied data structures for representing a set of words. The trie data structure allows for all basic algorithmic operations one can expect from a dynamic dictionary-type data structure (inserting, deleting, searching, enumerating) and can also be used to sort a set of strings. More formally, given an alphabet A = {a 1,..., a r } of cardinality r, and for a prefix-free 2 set of words Y with letters from A, the trie T (Y ) associated to Y is 1. As a side note, let us mention that initially Fredkin intended that tries should be pronounced tree as in the word retrieval. Alas the trie data structure escaped its creator and, nowadays, people mostly pronounced it as the word try, to distinguish verbally from tree. 2. The prefix-free property just means that none of the words is prefix of another. 37

42 38 THE DIGITAL TREE PROCESS 28 THE DIGITAL TREE PROCESS a b d f h o s t w n e i o r o e n t u h h r y c d e n a r r e e r e y e a s e m t e p v n c u a h i k s v e e a b d f h o s t w ny ecause d i o r e n e s ama orth ere ne t ep u rvive h e n h y r eck the a b d f h o s t w ny ecause rama orth ere ne hen hy i o t u r d ne ep rvive eck e s Figure 8.1. From top to bottom, a fully developed digital tree, a trie and a digital search tree built upon the words from the last sentence of the novel Moby Dick by H. Melville (inserted in order relevant only for the DST): The Drama s Done. Why then here does any one step forth? - Because one did survive the wreck.. In the trie, unary nodes which would not be present in PATRICIA tries are double circled. A terminal symbol is added to each word inserted. defined recursively thanks to the following rules 8 >< ;,, if Y = ;; ; T (Y ):= ) σ,, if Y = {{σ}; >: h,t(y, T \ a 1 ),T(Y T \ a 2 ), T T (Y \ a t )i, ), otherwise. where denotes an internal node and Y \ α is the subset built from Y by considering words that start with the letter, α, and stripped of their initial symbol. α. For a trie, the recursion stops as soon as Y contains less than two elements. Thus in order to build the trie for Y, one needs to consider only the minimal set of prefixes from Y by

43 2. DIGITAL TREES IN PHILIPPE FLAJOLET S WORKS 39 build the trie for Y, one needs to consider only the minimal set of prefixes from Y by which all words are distinguished one from another. The prefix-freeness condition is merely technical and is easily forced by adding a terminal symbol (not belonging to the alphabet) to each string. (If two strings have the same finite length, these terminal symbols might need to be different to avoid collisions of two identical strings; e.g., see the two occurrences of the word one in the Moby Dick example.) The abstract data structure has given rise to many algorithmic variants. For instance PATRICIA tries 3 [164] are tries where only useful internal nodes, i.e., participating in the branching process, are considered. Thus in this setting unary nodes (with only one child) are removed. At the very end of the spectrum lies the digital search tree (DST): it uses the same partitioning process as tries but strings are stored inside internal nodes (like Binary Search Trees). Unlike the case of tries, a DST for a set of words Y depends on the order in which the strings are inserted. So loosely speaking, digital search trees are intermediary between tries and binary search trees. When coming to implementation of digital trees, even for usual tries, several options are possible depending on the decision structure chosen to guide descent in each node to subtrees (arrays of pointers, linked lists, or binary search trees for instance). Some variants of digital trees have been precisely analyzed by Philippe Flajolet [102, 19, 21]. With a more conceptual point of view, the structure of a trie can be used to model or analyze the behavior of both deterministic and stochastic algorithms in computer science. Tries are especially relevant to branching and sorting processes. So it is not surprising that the digital tree process has ramifications in the management of large databases (dynamic hashing [70], see also the introduction to FIXTHIS-the Chapter on Hashing; probabilistic counting [44], see FIXTHIS-Chapter on Approximate Counting), in communication protocols [64] (for instance, for leader election [143, 170]), data compression (Lempel-Ziv and its variants [197], suffix trees [41, 156]), pattern matching [8, 138], random generation (to analyse precise schemes [95] or used as auxiliaries in Buffon machines [77]) and finally, rather unexpectedly, in computational geometry (for exact comparison of rationals [122]). The digital tree process is elegant and simple. In its algorithmic form, it is intuitive to implement and utilize. This helps explain why the algorithmic, analytic and probabilistic aspects of digital trees are fundamental in both theoretical and applied domains in computer science. 2. Digital trees in Philippe Flajolet s works We can identify three main periods with respect to the study of digital trees by Philippe Flajolet. The first period corresponds roughly to the early 1980s and focuses on tries and their applications in computer science. One must remember that, at this time of precursors, the average case analysis of algorithms was not yet a well recognized field among computer scientists (despite the spreading of Knuth s ideas). Worst-case analysis focusing on pathological cases were then the norm to study efficiency of algorithms. Tries (especially binary tries) and the underlying dyadic partition process were (and 3. For Practical Algorithm to Retrieve Information Coded in Alphanumeric.

44 40 THE DIGITAL TREE PROCESS still are) ideal tools for demonstrating the utility of average case analysis. Indeed, although the trie data structure is very efficient on average (and, indeed, can compete with the best known data structures in many applications), the worst case complexity is unbounded 4! It is worth noting that these first papers [116, 43, 91] introduce, in a pedagogical way, the case of binary tries built on a set of finite words of the same length l (choosing n words amongst the 2 l possible ones). This is a first combinatorial model easy to grasp. Of course it is not surprising that this model coincides when l tends to infinity with the trie model for infinite binary strings, which has proved to be the natural framework for the analysis of tries; this framework is also rigorous, using a probability measure discussed in [169]. In view of the applications (for instance, dynamic hashing or sorting), a general and symbolic methodology for average case analysis of digital trees is presented, with a distinction made between additive parameters such as size or path length, and extremal (or multiplicative) parameters like the height. The second period, during the mid 1980 s to mid 1990 s, Philippe Flajolet s work on digital trees is more oriented towards the refinement of methodological tools for studying variations of tries (that are numerous: multi-way trie, PATRICIA trie, quadtrie, k dimensional trie, LC-trie, ternary search tries, Digital search tree, etc). What was certainly appealing to Philippe Flajolet is that these analyses lead to challenging mathematical problems. He sharpened and generalized several analytic tools by working with digital trees. Amongst these many techniques used by Philippe and his co-authors to analyze digital trees and their variants [43, 91, 93, 60], we emphasize his pioneering work to systematically apply generating functions, the symbolic method, singularity analysis, the saddle point method, the Mellin transform, and poissonization/depoissonization techniques. Using tries and variants, he made many fruitful incursions into domains such as polynomial factorization [117], algebraic methods [91], differential equations [93], and random number generation [95]. For roughly the last 12 years of his life ( ), he paid particular attention to developing and analyzing general probabilistic frameworks and tools [21, 53, 192, 94], characterizing the stochastic generation of words and strings which are inserted in digital trees 5. This aspect is clearly related to information theory (see Chapter FIXTHIS-XYZ on Information Theory). Indeed one fundamental aspect of random digital trees is that the randomness essentially comes from the set of words they are built on; hence, there is a need to precisely model the source producing (infinite) words. This need was resolved by a very general framework introducing dynamical sources (mainly designed by Brigitte Vallée; see the corresponding chapter/volume- FIXTHIS-on-Dynamic-Sources for related works of Philippe Flajolet), which relies on transfer operators of dynamical systems theory. In this framework various parameters of tries are analysed [21] showing that the constants involved are related to intrinsic properties of the source like the entropy of the source. The Grail of such a study would be to consider for analyses a totally general framework where a (stochastic) source is induced from the family of fundamental probabilities {p w } w A, where p w is the 4. Intuitively two words in a trie can share a prefix of arbitrary length; fortunately, the probability of having any infinite-length overlaps is 0 in the naturally induced probability measure. 5. A study initiated by Devroye in [30] for binary tries.

45 2. DIGITAL TREES IN PHILIPPE FLAJOLET S WORKS 41 probability that an infinite string produced by the source on the denumerable alphabet A begins with w. P. Flajolet s methodology was then to put stochastic properties of the source into a correspondence with analytic properties of the Dirichlet series Λ(s) = w A p s w, for a complex parameter s; thus a connection is forged between a stochastic and analytic approach, which yields results for analysis of algorithms. This unifying view illuminates, for instance, ways that the fundamental characteristics of the source (like the entropy) appear in critical ways during the analyses of algorithms or data structures related to the digital trees, especially when computing constants in asymptotic expansions related to generating functions of tree parameters. This approach allows one to relate, for example, the study of the famous Quicksort algorithm for strings [192] (considering keys are strings produced by a source) to the one of ternary search tries (which mixes binary search trees inside tries, whereas Quicksort for strings mixes tries inside a binary search tree). A recurring theme in the analysis of tries is the appearance of minute oscillatory phenomena under some conditions. These oscillations are small but nonetheless are of great interest for the precise asymptotic mathematical analysis. In fact, this is inherent to the partitioning process, especially in the simplest stochastic model, in which words are drawn from an unbiased memoryless source. One key aspect for evaluating these oscillations together with error terms is to study precisely poles of the Mellin transform, relying on geometric and arithmetic conditions [94] of the source 6. Methodology. We concentrate here more on tries. Some of the main parameters for tries are the size (the number of internal nodes) in relation with the memory space needed to store the data structure, the external path length (sum of the lengths of all paths from the root to all leaves) which relates to the construction cost of the data structure, and the height of trie (corresponding to the worst case for the number of symbol comparisons between any two strings in a set of strings). A random trie built over 500 uniformly generated strings, originally displayed in [53], is given in Figure 2. As already mentioned, the studies differ whether we are interested in additive (size, path length) or multiplicative (height) parameters. However there are always two main steps: one is algebraic and provides exact expressions, the second one is analytic and aims at providing precise asymptotics. An additive parameter γ for a trie T (Y ) built on the set of strings Y is decomposed thanks to a toll function τ in the following recursive way γ(t (Y )) = τ(y ) + α A γ(t (Y \ α)), 6. This leads to a surprising irruption of the Riemann hypothesis, to the great pleasure of P. Flajolet, when studying tries built upon words resulting from the continued fraction expansion of random numbers of the unit interval (see [21]).

46 42 THE DIGITAL TREE PROCESS Figure 8.2. A random trie of size n = 500 built over uniform data [53] where τ(y ) is a toll function associated to the root of the trie. For instance, the toll functions for size (number of internal nodes), external path length are respectively (using the Iverson notation, i.e., B is one if property B is true and zero otherwise) τ(y ) = Card(Y ) 2, τ(y ) = Card(Y ) Card(Y ) 2. Concerning the algebraic step, when the number of strings in the trie is a Poisson random variable N with parameter z z zk P(N = k) = e k!. (this is called a Poissonized model) instead of a fixed number n (in the traditional Bernoulli model), the calculations are greatly simplified in the resulting model. The Poisson distribution is well concentrated around the mean z, so that by choosing the parameter z = n the two models agree in many ways. Expressions in the Poisson model are also called Poissonised generating functions and are related to exponential generating functions. Then at this stage, for additive parameters of the trie, we usually obtain a functional equation for the Poissonised generating functions which can be iterated, yielding an infinite sum. The variable z is then interpreted as a complex-valued variable, and asymptotic methods are used. The expressions computed at the previous stage are either directly or very close to harmonic sums of the form G(z) = k K λ k g(µ k z), where families (λ k ) and (µ k ) are called amplitude and frequencies and g is the base function. Then the tool of choice to study asymptotics of harmonic sums is the Mellin

47 3. CONCLUSION 43 transform as it isolates the transform of the base function and a Dirichlet-type series involving only the λ k s and µ k s. We refer to the corresponding chapter [FIXTHIS- CITE-Dumas-chapter] for a more precise description. The next step is then to relate the asymptotic development to the poles of the Mellin transform using methods from analytic combinatorics. The final, required step to perform depoissonisation, i.e., to interpret the degree to which the results from the Poissonized model are also valid in the (original) Bernoulli model, in which the number of strings is fixed. We note that this is the most standard path to analyse tries (or related structures). This is sometimes referred as the Poisson-Mellin-Newton cycle. But there are other paths possible for instance using the Rice-Nörlund formula [192] (instead of Mellin transform) after an algebraic depoissonisation step. The schema does not apply to multiplicative parameters. The precise analysis of the distribution of the height relies on a saddle point estimate [44]. Tries and the Analytic Combinatorics book. Although digital trees have been a focus of interest throughout the whole Philippe Flajolet s career, it is worth noting that digital trees are not present 7 in the book co-authored with R. Sedgewick [115] which is undoubtedly bound to be the reference in the field of analytic combinatorics. It may be useful to put this on perspective. Indeed there was discussion between the authors, P. Flajolet and R. Sedgewick, to decide the fate of digital trees together with Mellin transform with respect to the book. The Mellin transform, and the analysis of tries as an advanced application, were in fact originally planned to be included as a book chapter [108]. These topics were omitted however, in the final stages, according to R. Sedgewick because revising and incorporating the full discussion would caused too much delay for this long awaited book. One may also infer that digital trees do not completely fit into the philosophy of the book, since problems related to a trie are of a more stochastic than combinatorial nature. Indeed, for digital trees, one is often confronted with generating functions for parameters satisfying a functional equation, from which an explicit expression (under the form of an infinite sum) is deduced by iteration. But these generating functions are not generating functions for combinatorial structures. 3. Conclusion This whole chapter illustrates why the digital tree process is central in computer science, and how the related analyses make intervene deep mathematical tools. A small lecture guide. The beautiful survey [53] made by Philippe Flajolet himself is surely a wonderful entry point into the subject. The reader interested in algebraic methods for tries should be delighted with [91]. Two articles are more concerned with methodology and in particular extend analyses to digital search trees [93, 102] (for digital search trees, functional equations now involve differentiation). Finally a general framework where strings are generated by very general sources together 7. However we remark that tries are the subject of a chapter in the introductory book by the same authors Flajolet and Sedgewick [180] (with an analysis on the average relying only on elementary computations).

48 44 THE DIGITAL TREE PROCESS with analyses of related structures (like tries) and sorting & searching algorithms is found in [21, 192, 94].

49 PAPER 9 Paper 34 45

51 Chapter IV MELLIN TRANSFORM

53 Dr Flajolet s elixir or Mellin transform and asymptotics Philippe Dumas Philippe Flajolet (abbreviated PF in the sequel) greatly developed the use of Mellin transform in the asymptotic evaluation of some combinatorial sums that appear in the average case analysis of algorithms. In fact, the Mellin transform runs throughout PF s work from the beginning [74] to the very end [13]. Mellin transform and fundamental strip The Mellin transform is an integral transform, like the Laplace transform or the Fourier transform. It takes as input the original f(x), which is a function of a real variable defined on the real positive half-line. It produces the image f (s), which is a function of a complex variable, defined by f (s) = + 0 f(x)x s 1 dx. It is not clear that the formula actually defines anything, but the kernel x s 1 leads us to a comparison with the powers of x. It is readily seen that the assumption f(x) = x 0 O(x α ) guarantees the convergence of the lower part of the integral, say from 0 to 1, for complex numbers s whose real part is greater than α, that is for the s which are on the right of α. We can make a similar assumption about the behavior at infinity. In this way, the image f (s) is defined within the intersection of a right half-plane and a left half-plane. This is a strip, called the fundamental strip. Certainly the most basic example of a Mellin transform is the gamma function Γ(s) = + 0 e x x s 1 dx. It is the Mellin transform of the exponential e x. In that case the original is O(1) at 0, so the left abscissa of the fundamental strip is 0. It decreases as x tends to infinity more rapidly than every power of x and the right abscissa is +. Hence the fundamental strip in this case is the positive right half-plane. But the gamma function extends to the whole complex plane as a meromorphic function, and the extension has poles at 49

54 50 DR FLAJOLET S ELIXIR OR MELLIN TRANSFORM AND ASYMPTOTICS Figure 9.1. Absolute value of the gamma function. all the nonpositive integers. The extension has poles at all the nonpositive integers, hence the peaks on the left-hand side of Figure 1. Symbolic analysis Imagine a flat landscape, something like a flat sand desert. This country is the complex plane. A track straight through it. This is the real axis. But you are thinking of a meromorphic function and suddenly this changes the landscape. Some hills or even prodigious mountains appear and at the top of these mountains some placards are fixed on poles. Each vertex is located above a pole of the meromorphic function and on the placard you can read the singular part of the function at the pole. For example, if you are thinking of the gamma function, you see an infinity of chimneys aligned on the negative part of the real axis, which disappear at the horizon in a haze of heat. On the placard at abscissa k, you read ( 1) k /(x + k). Your fantasy knows no limits and if you thought only for a moment about Stirling s formula a placard appears on the other side, at the end of the real positive axis, with its wording Γ(x) x + (x/e) x 2π/x. The more you direct your gaze toward a part of the landscape, the more details spring up. There is no doubt that PF had such a mental image of analytic functions, certainly in a more subtle way, refined by more than thirty years of practice. Undisputable evidences are the introduction to the saddle-point method of [115, Chap. VIII], the discussion about coalescing saddle-points from [4], or the picture of [55] illustrating the application of the saddle-point method to a generalized exponential integral.

55 FUNDAMENTAL RESULT 51 De Bruijn, 1948 De Bruijn, Knuth, Rice, 1972 Knuth, 1973 Sedgewick, 1978 Kemp, k=0 1 ln 1 e xrk k b d(k)e k2 x k 1 x = 1/ n 2 j (e x/2j 1 + x/2 j ) x = n j 1 F (k)e k2 x k 1 k a v 2(2k)e 16k2 x x = 1/ j x = 1/ n k 1 TABLE 1. The first uses of the Mellin transform approach related to the analysis of algorithms, with their authors and date, and the harmonics sums therein. Similarly, the search for summatory formulas is reduced to a purely formal handling [98]; the asymptotic study of divide-and-conquer type sums is reduced to picking residues [58, 62]. Always he was defining rules that provide an automatic treatment of issues and reduce mathematical analysis to algebra and rewriting systems [100, 51]. Here we try to mimic this attitude and concentrate on the ideas, neglecting the mathematical assumptions. Fundamental result The fundamental result about Mellin transform is the following. There is a strong correspondence between the behavior of the original at 0 and the poles of the image in the left half-plane with respect to the fundamental strip. Similarly, the behavior at infinity of f(x) is related to the poles of f (s) in the half-plane to the right of the fundamental strip. The correspondence is explicit and given by the following formulas: a term x ξ ln k x in the expansion at 0 corresponds exactly to a singular term ( 1) k k!/(s + ξ) k for a pole ξ at the left of the fundamental strip. The formula is the same for the expansion at + and the poles on the right-hand side of the strip, but with an opposite sign: to ( 1) k k!/(s + ξ) k corresponds x ξ ln k x. Particularly, for simple poles (that is k = 0), it is very simple: on the left-hand side the coefficients of the expansion at 0 are the residues of the poles, and similarly on the right-hand side. Clearly, the correspondence works very well in case of the gamma function, (1) Γ(s) = + Re(s) 0 k=0 ( 1) k k! 1 s + k, e x = + x 0 k=0 ( 1) k x k. k! Note that in (1) we do not claim that the series converges or even that the sum is the gamma function. This equation is only a formal writing, in line with symbolic analysis.

56 52 DR FLAJOLET S ELIXIR OR MELLIN TRANSFORM AND ASYMPTOTICS Harmonic sums The second component of the story is the notion of harmonic sum. We start with a base function and its Mellin transform. To the base function, we associate a harmonic sum (2) F (x) = k λ k f(λ k x). This is a linear combination of some dilations of the base function. We merge both ideas and we obtain a very simple result about the Mellin transform F (s) of the harmonic sum. It is the product of the Mellin transform f (s) of the base function and some generalized Dirichlet series Λ(s), which depends only on the coefficients involved in the harmonic sum, (3) F (s) = f (s) k λ k µ s k. Zigzag method We understand the power of the previous results by playing with the zigzag method, going back and forth between originals and images. We start with our favorite example f 1 (x) = e x and its Mellin transform, the gamma function. We consider an alternative function f 2 (x) = x + e x 1 = xe kx, which is actually a harmonic sum. We compute its Mellin transform and we collect its poles k=1 f2 1 + (s) = Γ(s + 1)ζ(s + 1) = Re(s) 0 s + j=0 They are 0 and the negative integers. But we know that f 2 (x) = x + + n=1 ( 1) j ζ( j) j! B 2k (2k)! x2k 1 s + j + 1. is the generating function of the Bernoulli numbers. Moreover these numbers are zero for odd integers starting at 3, hence the writing f2 1 (s) = Re(s) 0 s s B 2k 1 (2k)! s + 2k. As a consequence the zeta function vanishes for the negative even integers. We now bring into the game a new harmonic sum f 3 (x) = k=1 k=1 e k2 x 2

57 AVERAGE-CASE ANALYSIS OF ALGORITHMS AND HARMONIC SUMS 53 and compute its Mellin transform f3 (s) = 1 ( s ) 2 Γ ζ(s). 2 The vanishing of zeta at all the even negative integers removes almost all the poles of the gamma function. There remain only two poles, at 0 and 1. With π f3 1 (s) = Re(s) 1 2 s s (again, the writing is purely formal), we readily obtain the expansion of the function at 0 π 1 f 3 (x) = x 0 2 x O(x+ ). Once we have understood the trick, it is not difficult to deal with other examples, like f 4 (x) = k=1 d(k)e k2 x 2. Here, d is the divisor function and the Mellin transform is f4 (s) = Γ(s/2)ζ(s) 2 /2. Again we obtain the expansion at 0 easily (the double pole at 1 makes a logarithm arise) π ln x π f 4 (x) = x 0 2 x + 4 (3γ ln 4) 1 x O(x+ ). Average-case analysis of algorithms and harmonic sums The application of the previous ideas to the analysis of algorithms began in the seventies, with a 1972 article [27] of Nicolaas G. De Bruijn, Donald E. Knuth, and Stephen O. Rice about the height of rooted plane trees (Table 1). Knuth [150, p ] refers to the method of the gamma function and credits De Bruijn with first having this idea. De Bruijn had used it in a 1948 paper [26] about the asymptotic evaluation of the binary partitions number. Next we encounter the study of radix-exchange sorting by Donald Knuth [150], of the odd-even merging by Robert Sedgewick [179], and of the register allocation for binary trees by Rainer Kemp [144]. In every case, a harmonic sum comes out (in the expressions of Table 1, d is the divisor function, and are the backward and forward difference operators, v 2 is the dyadic valuation function). The first sum is a generating function, while the others ones are combinatorial sums, but they are all amenable to the same treatment. According to [71], PF learned the Mellin transform from Rainer Kemp around In a 1977 work about register allocation [86, 87], PF and his coauthors follow an elementary way à la Delange, but in a 1978/1979 talk [42] at the Séminaire Delange-Pisot-Poitou PF gives an explanation about the Mellin-Fourier transform, with only words but in a totally clear way. PF has systematized the idea starting in the eighties and completely defined the method in the early nineties. This led him to write first [88], a very illuminating presentation of the Mellin transform in the context of the analysis of algorithms, and next [60], a more comprehensive version. He returned to the topic in [108], in which the reader can find not only examples but even exercises.

58 54 DR FLAJOLET S ELIXIR OR MELLIN TRANSFORM AND ASYMPTOTICS PF, Odlyzko, 1981 [73, 74] PF, Puech, 1983 [82, 84] Fayolle, PF, Hofri, 1986 [36, 37] PF, Richmond, 1992 [92, 93] Mahmoud, PF, Jacquet, Régnier, 2000 [160, 161] PF, Fusy, Gandouet, Meunier, 2007 [54] Broutin, PF, 2010 [13] average height of simple trees retrieval of multidimensional data multi-access broadcast channel generalized digital trees bucket selection and sorting cardinality estimation σ r(n)e nu n 1 k 1 ε l 2 j(k s) l j 0 (1 e xα i,j β j,l e xα j,l ) e µ ( (1 + Kµ) σ H (e rx 1 rx) + Krx(e rx 1) ) + 2 k (1 + 2 k t) b 1, Q(2 k t) b k=0 + Q(u) = (1 + u ) 2 j + z k=0 + k=1 j=0 (1 e z/bk ) and relatives (e x/2k e 2x/2k )e xu/2k h r e ht height in (1 e non-plane ht ) 2 h 1 binary trees Some of PF s contributions with their authors, date, and refer- TABLE 2. ences, next the topics under consideration, and finally the harmonic sums that appear in these papers. Table 2 provides a small sample of PF s contributions. The third example [36, 37] is impressive: the sum is over the affine transforms σ(z) = µ + rz in a semi-group H = {σ 1, σ 2 } generated by two affine transforms σ 1 (z) = λ + pz, σ 2 (z) = λ + qz with p + q = 1. The scope of application is quite broad and we refer to [60, p. 5] for a list of relevant fields. Exponentials in harmonic sums It is remarkable how often the exponential function appears in the harmonic sums. The reason is the following. In the process of analyzing an algorithm, we are faced with combinatorial sums, which generally are not harmonic sums. But there may be a suitable approximation which is a harmonic sum. There are essentially two rules. The first is the approximation of a large power by an exponential, (1 a) n = n + e na ( 1 + O(na 2 ) ) with na = n + nε, 0 < ε < 1/2

59 TECHNICAL POINT 55 Here is an example related to the analysis of the radix-exchange sort algorithm [150, p. 131] + ( 1 (1 12 ) n ) ( ) 1 + k = F (n) + O n, F (x) = (1 e x/2k ). n + k=0 Others examples can be found in [70, p.188], [84, p. 230], [76, p. 388], [13, p.131], or [125, p. 71]. The second rule is the approximation of the binomial distribution by a Gaussian distribution ( ) 2n ( ( )) n k 1 ( ) = 2n n + e w2 1 + O with k = w n, k = n n + o(n3/4 ). n It appears in the study of the expected height of plane (Catalan) trees [27, p. 20] ( ) 2n n ( ) n k 1 + d(k) ( ) = G n + o(1) G(x) = d(k)e k2 x 2. 2n n + k=1 k=1 n or in the study of odd-even merging [179], [85, p. 153] resumed in [46, p. 286] and [194, p. 478]. Technical point For the benefit of the reader who wants to apply the Mellin transform to his/her own problem, we leave for a while the formal style and enter into analysis. Frequently, the original f(x) is not only defined on the real positive axis, but on a sector arg x < ω of the complex plane. This constraints the image strongly. In this case, it satisfies (4) f (s) = s ±i e ω Im(s). Such an inequality (not necessarily of exponential type) is the key point allowing to use the inverse formula (5) f(x) = 1 f (s)x s ds 2πi (c) (see the proof of Theorem 4 in [60]). In this formula, (c) is the vertical line at abscissa c taken in the fundamental strip. It is noteworthy that x can be a complex variable and not only a positive real variable, contrary to what we started with ([31] or [140] for a brief account). This is of practical importance since the application of the Mellin transform to a generating function is frequently the first step of an analytical process. It provides the local behaviour of the function at a distinguished point, and can be followed by the use of the Cauchy formula, for example with the saddle-point method [26, 35], or followed by singularity analysis [48, p. 397], [76, p. 238], [13, p. 24]. k=0

60 56 DR FLAJOLET S ELIXIR OR MELLIN TRANSFORM AND ASYMPTOTICS Figure 9.2. The Mellin transform captures oscillating behavior of a very small amplitude. Oscillations The study of the number of registers necessary to evaluate an expression represented by a binary tree is perhaps the most classical example which provides a harmonic sum. PF and Helmut Prodinger dealt with a variant of the problem in a 1986 paper [80]. They begin by revisiting the case of a binary tree. They need to know the local behaviour of the function E(z) = 1 u2 u v 2 (k)u k in the neighborhood of 1/4. For this, they perform some changes of variables z = k 1 u (1 + u) 2, u = 1 r 1 + r, r = 1 4z, u = e t (z = 1/4 corresponds to u = 1 and t = 0) and a harmonic sum V (t) comes out. They then compute its Mellin transform, V (t) = k 1 v 2 (k)e kt, V (s) = ζ(s) 2 s 1 Γ(s). They collect the coefficients of the asymptotic expansion of V (t) at 0 by a process which seems now to be routine. Because of the denominator 2 s 1, there is a line of poles χ k = 2kπi/ ln 2 on the imaginary axis. These poles are regularly spaced and

61 RELATED TOPICS 57 contribute a trigonometric series C + P (log 2 t) with respect to log 2 t, C = γ ln(2π), P (log 2 ln 2 2 t) = k 0 In this way they obtain the expansion they are looking for Γ(χ k )ζ(χ k ) e 2kπi log 2 (t). ln 2 1 V (t) = t 0 t ln 2 ln(t) + C + P (log 2 t) + O(t), E(z) = z 1/ r log 2 r + (2C + 1)r + 4rP (log 2 r) + O(r 2 ), r = 1 4z. The key point is the occurrence of a function which is 1-periodic with respect to log 2 t. The gamma function decreases very rapidly on the imaginary axis and this periodic function therefore has a very small amplitude. (Figure 2 displays the graph of P (log 2 t)). More precisely, the magnitudes are C 0.66, Γ(χ 1 ) , Γ(χ 2 ) , Γ(χ 3 ) One could say that this function is so small as to be of no importance. But it emphasizes the difficulty to obtain such an asymptotic expansion by elementary arguments. This point in particular delighted PF [163, Comment 5, p. 226], [?, p. 8], [70, p. 206]. Related topics In this small introduction to the Mellin transform, we have neglected many issues. Among them, the Rice formula permits to study high order differences of a sequence. Also, the Poisson-Mellin-Newton cycle relates the Poisson generating function of a sequence, its Mellin transform, and the Rice integral. A good reference about these topics is PF and Robert Sedgewick s 1995 article [107]. Another topic omitted here is the Mellin-Perron formula. It is presented in the next chapter of this volume. Mellin transform is a pivotal ingredient in depoissonization [140, 54]. It is also related with the Lindelöf representation, which appears in [50, Formula (11)], [51, p. 565], itself connected with the magic duality. PF often spoke about this topic, but he has written very little about it. It is alluded to in [55] and developed briefly in a note of Analytic combinatorics [115, p. 238]. The right reference is [154, Chap. V].

63 PAPER 10 Paper 58 59

65 Chapter V DIVIDE AND CONQUER

67 Divide-and-Conquer Recurrences and the Mellin-Perron Formula Yun Kuen Cheung, Mordecai Golin Deriving the asymptotic behavior of functions described by divide-and-conquer recurrences is often quite easy. Deriving the exact behavior can be difficult, though. Philippe Flajolet and his collaborators helped us see how the use of tools from analytic combinatorics, in particular the Mellin-Perron formula and its variations, permits mechanical derivation of the exact behavior. In particular, these tools make clear the periodic terms that often arise out of such recurrences. He also showed how many number-theoretic (digital sum) functions can be analyzed in the same way. 1. Introduction Divide-and-conquer recurrences model many algorithmic and mathematical problems. The most basic such recurrence is the binary one, (1) n 2, f n = 2f n/2 + e n, where f 1 and the e n are given. Such a recurrence corresponds physically to solving a problem of size n by (1) splitting the problem into two equal size sub-problems of size n/2; (2) solving the two subproblems recursively in time 2f n/2 and then; (3) combining the two subsolutions in time e n to derive a solution to the full problem. The canonical example of such a problem is bottom-up list-mergesort in which a list of n elements is sorted by (1) first splitting the list into two lists of size n/2; (2) then recursively sorting the two lists; (3) and finally merging the two sublists by using a maximum of e n = n 1 comparisons. f n denotes the worst case number of comparisons needed to sort n items; its initial condition is f 1 = 0. 63

68 64 DIVIDE-AND-CONQUER RECURRENCES AND THE MELLIN-PERRON FORMULA It is well known, e.g., the Master Theorem [22, p. 73] that o(n) f n = Θ(n) if e n = Θ(n) f n = Θ(n log n) Θ(n k ), k > 1 f n = Θ(n k ). More discriminating versions of this theorem can replace (1) with more general recurrences of the form f n = af n + e b n. Furthermore, in many cases these theorems provide exact first order asymptotic terms. So, it might seem as if the analysis of such recurrences was completely understood. That they are not is because equations in the form (1) do not completely capture the underlying physical processes. More specifically, odd sets of size n cannot be split into two equally size subsets. For example, in Mergesort, the original set of size n is split into one subset of size n/2 and the other of size n/2. Thus (1) must be replaced by (2) f n = f n/2 + f n/2 + e n. Even in the simple Mergesort case of e n = n 1 and f 1 = 0 the solution is now more complicated, i.e., (3) f n = n lg n + na(lg n) + 1 where 1 lg x log 2 x and A(u) = 1 {u} 2 1 {u}, in which {x} denotes the fractional part of x; e.g., {2.7} = 2. Observe that, by definition, A(x) = A({x}). In particular, this implies A(x + 1) = A({x + 1}) = A({x}) = A(x) so A(x) is periodic with period 1. Noting that lim x 1 A(x) = 1 = A(0) shows that A(x) is continuous as well. This phenomenon of periodic terms appearing in the solution to divide-and-conquer recurrences is very common. These periodic terms often appear as coefficients of the second order term as in (3), but occasionally they are coefficients of the leading term, e.g., with f n = na(lg n) + o(n). They are usually continuous but often are also non-differentiable, at least at a countably infinite number of points. As can be imagined, such terms cause difficulty when trying to derive solutions to recurrences. Papers [57, 58] associated with this chapter develop a method, based on the Mellin- Perron formula, for deriving solutions that capture these periodic terms. Because of the method used, these periodic terms are naturally derived in Fourier series form. Paper [62] uses a very similar technique to analyze digital sums. These are number theoretic functions that are summations of functions of the digital representation of integers. The canonical example is the sum-of-digits function. Let v 1 (n) be the number of 1 s in the binary representation of n, e.g., v 1 (13) = v 1 ( ) = 3 and 1. See [58] for a derivation, based on a derivation of a related function in [151, p. 400].

69 2. THE BASIC TECHNIQUE 65 v 1 (24) = v 1 ( ) = 2. Set f n = i<n v 1(i) to be the number of 1 s in the binary representation of all integers less than n. Delange [29] showed that (4) f n = 1 n lg n + nd(lg n) 2 where D(x) is defined by a Fourier series D(x) = k d ke 2πikx (and is therefore periodic with period 1). Its Fourier coefficients are and d 0 = lg π log k 0, d k = 1 ζ(χ k ) log 2 χ k (χ k + 1) where χ k = 2πik log 2 and ζ(s) is the Riemann Zeta function. Delange s derivation of (4) in [29] was quite complicated. [62] develops an elementary derivation of (4), again using Mellin-Perron based techniques. The method is very powerful and is employed to analyze many digital sums, not just the sum-ofdigits function. Later, [16] extended the above techniques to analyze more generalized forms of both problems, i.e., multidimensional divide-and-conquer recurrences and weighted digital sums. 2. The basic technique The basic technique follows from two observations. Some introductory definitions are first needed. Start by recalling f n = f n f n 1 and g n = g n+1 g n, the backward and forward difference operators on sequences f n and g n. Set w n = f n ( f n ) to be the double-difference operator. Simple algebra yields n 1 (5) f n = nf 1 + (n k) f k. k=1 As noted in [62, Theorem 2.1], the Mellin-Perron Formula 2 states that: THEOREM 1 (Mellin-Perron). Let R(s) denote the real part of complex number s. If, in the complex plane, there is some real c > 0 such that the Dirichlet series W (s) = converges absolutely for R(s) > c, then n=1 (6) w n n s n 1 (n k)w k = n 2iπ k=1 c+i c i W (s)n s ds s(s + 1). 2. Theorem 2.1 of [62] actually states a more general version of the Mellin-Perron formula. The version stated here is the special case of m = 2 in [62].

70 66 DIVIDE-AND-CONQUER RECURRENCES AND THE MELLIN-PERRON FORMULA Problem f n definition f n solution Worst case Mergesort f n/2 + f n/2 + n 1 n lg n + na(lg n) + 1 Average Case Mergesort f n/2 + f n/2 + n γ n n lg n + nb(lg n) + O (1) Variance of Mergesort f n/2 + f n/2 + δ n n C(lg n) + o(n) Sum of Digits Function i<n v 1 1(i) 2n lg n + nd(lg n) Triadic Binary Numbers i<n h(i) n1+lg 3 E(lg n) 1 4 n Number of Odd Binary Coefficents in first n rows of Pascal s Triangle i<n 2v1(i) n lg 3 F (lg n) Figure Representative divide-and-conquer recurrences [58] and digital sums [62] analyzed by papers in this chapter. γ n and δ n are defined in [58]. v 1 (n) is number of 1 s in the binary representation of n, e.g., v 1 (13) = v 1 ( ) = 3 and v 1 (24) = v 1 ( ) = 2. v 2 (n) is the exponent of 2 in the prime decomposition of n, e.g., v 2 (13) = v 2 ( ) = 0 v 2 (24) = v 2 ( ) = 3. h(n) evaluates a base 2 number as a base 3 number, i.e., h ( i 2ei ) = i 3ei. E.g., h(5) = h(101 2 ) = = 10 and h(13) = h( ) = = 37. All of the A, B, C, D, E, F functions are periodic with period 1 but many of them are not differentiable, at least at a dense set of points. converges abso- Combining (5) and (6) then gives that, if W (s) = lutely for R(s) > c, then n=1 f n n s (7) f n = nf 1 + n 2iπ c+i c i W (s)n s ds s(s + 1). A-priori, this doesn t look helpful; unknown function f n has now been rewritten n in terms of a complicated integral whose kernel I(s) = W (s) s s(s+1) contains a term W (s) which is itself a complicated function of the sequence f n. What makes (7) interesting, though, is the following observation:

71 2. THE BASIC TECHNIQUE 67 Observation 1: If f n is defined by a divide-and-conquer recurrence or a digital sum then W (s) has a simple form. More particularly, [57, 58] shows that if f n is defined by the divide-and-conquer recurrence (1) with initial conditions e 0 = f 0 = e 1 = 0 then (8) W (s) = Ξ(s) 1 2 s where Ξ(s) = n=1 e n n s. This means that W (s) is now expressed in terms of known sequence e n instead of unknown sequence f n. Similarly, [62] shows that for many f n that are defined by digital sums, W (s) can be written in terms of known functions. A canonical example (implicit in the proof of Theorem 3.1 in [62]) is that if f n = i<n v 1(i) is the sum-of-digits function then (9) W (s) = n=1 f n n s = 2s 2 2 s 1 ζ(s) where ζ(s) is the Riemann Zeta function. Similar derivations are shown in [62] for various other digital-sum type functions. Observation 1 only states that the kernel I(s) of the integral in (7) can be expressed in a simple calculable form. The integral still needs to be evaluated. The 2nd major observation, shared by all of [16, 57, 58, 62] is that the integral can be evaluated using the Cauchy Residue theorem. More specifically, by judicious choice of contours, [16, 57, 58, 62] show that, in all of the problems they address, the integral can be evaluated by summing residues in the half plane R(s) < c. Observation 2: In the problems addressed, the integral 1 2iπ c+i c i W (s)n s ds s(s + 1) appearing in (7) is, up to an asymptotically negligable error term, equal to the sums of the residues at all singularities of the kernel in the left half plane R(s) < c. n s I(s) = W (s) s(s + 1) As a simple example we revisit the basic divide-and-conquer recurrence for worst case Mergesort n 2, f n = f n/2 + f n/2 + n 1, with initial condition f 1 = 0. This has e n = n 1 so e 1 = e 2 = 1 and n 2, e n = 0. Thus Ξ(s) = n=1 e n n s = 1 and W (s) = Ξ(s) 1 2 s = s.

72 68 DIVIDE-AND-CONQUER RECURRENCES AND THE MELLIN-PERRON FORMULA Note that this W (s) converges absolutely for, e.g., R(s) > 3, so we may set c = 3. Since f 1 = 0, (7) implies that f n n = 1 2iπ 3+i 3 i n s 1 2 s ds s(s + 1). By Observation 2, it is equal to the sum of residues at all singularities of I(s) = ns s s(s + 1) in the left half plane R(s) < 3, plus, possibly, an asymptotically negligible term. In this particular case, [58, 57] further show that this error term is zero. We now note that, in that half plane, the singularities of I(s) are (1) A double pole at s = 0 with residue lg n log 2. (2) A simple pole at s = 1 with residue 1 n. (3) Simple poles at s = 2kiπ/ log 2, k Z \ {0} with residues a k e 2ikπ lg n : a k = 1 1 log 2 χ k (χ k + 1) The sum of these residues is lg n + A(lg n) + 1 n where A(u) has the explicit Fourier expansion, Thus A(u) = k Z with χ k = 2ikπ log 2. a k e 2ikπu, with a 0 = log 2. f n = n log 2 n + na(lg n) + 1. It can be verified that this A(u) is exactly the Fourier series representation of A(u) = 1 {u} 2 1 {u} in (3). Note that the periodic term A(lg n) comes from adding the residues of I(s) at χ k, 1 which are the singularities (simple poles) of the factor 1 2 in I(s). s Similarly, in the analysis of the sum of digits function, we have, from (9) that, for f n = i<n v 1(i) f (10) W (s) = n n s = 2s 2 2 s 1 ζ(s) n=1 which converges absolutely for R(s) > 2. Consider the corresponding I(s) = 2s 2 2 s 1 ζ(s) s(s + 1). ζ(s) s only singularity is a simple pole at s = 1 which is cancelled in I(s) by the simple zero of 2 s 2 at s = 1. Since ζ(s) has no zeros, Delange s solution (4) can now be simply rederived by noting that I(s) has a double pole at s = 0 and simple

73 3. CONCLUDING REMARKS 69 poles at s = 2kiπ/ log 2, k Z \ {0} and then summing all of their associated residues. Note that in both the examples described above the periodic term in the solution falls directly out of the analysis when adding up residues at the poles s = 2kiπ/ log 2, k Z \ {0}, because they can be interpreted as the terms of a Fourier series. This phenomenon appears in all of the problems addressed by the papers in this chapter. 3. Concluding Remarks This chapter introduced and quickly sketched the Mellin-Perron techniques developed by Philippe Flajolet and his collaborators in [16, 58, 57, 62] for analyzing divide-and-conquer recurrences and digital sums. These techniques are particularly appropriate for deriving the periodic terms that often arise in these problems. The techniques are recipes, in that Observations 1 and 2 above are frequently applicable. This chapter glossed over some of the difficulties and extensions that appear in those papers and we urge readers who want more details to read the papers. The most important are Showing that Observation 2 is correct often requires more work. This usually involves proving that integrals over specific contours are asymptotically negligible. Such proofs are often problem-specific. Proving convergence of the derived Fourier series can be tricky (and sometimes uniform convergence doesn t occur). This chapter only showcased the version of the Mellin-Perron formula that applies to double-summation. There are other versions that can be used for single-summation, triple-summation, etc..

75 PAPER 11 Paper

77 Chapter VI COMMUNICATION PROTOCOLS

79 Flajolet s Work on Telecommunication Protocols and Collision Resolution Agorithms Philippe Jacquet 1. Introduction When we consider the contributions of Philippe Flajolet in the telecommunication domain, the first item that come to memory is the wonderful "approximate counting" algorithm [5]. As discussed in a separate chapter, approximate counting alllows to estimate the number n of distinct elements in a multiset with arbitrary order with only a memory of logarithmic size. The simplicity and elegance of this algorithm makes it the ideal tool to prevent cyber-attacks in the Internet, either in web server or in edge routers. Indeed the packet address headers of an internet flow can be seen like a multiset. Too many distinct client at the same time toward the same server is the symptom of a cyber-attack, The algorithm can also be used to distinguish between heavy sessions and short session, bringing a very efficient tool to traffic shaping. Anyhow the approximate counting algorithm was not originally invented for telecommunication and this is later on that it was found pertinent to the internet. However Flajolet has worked on many algorithms originally designed for telecommunication during the 90 s (more precisely in the year 1985, annus mirabilis). His favorite field of experimentation was the resolution of collision. Some of his algorithm are now in standard home internet access. Less obvious, some of the collision resolution algorithms analyzed by Flajolet have direct links with tree structure, and this may explain why he found so much interest in them. This chapter is devoted to the review of those algorithms. We first make a short review of what is a protocol, what is a collision in the universe of telecommunication. We will in passing describe the Aloha algorithm as the main originator of this technology. Second we will describe the tree collision resolution algorithm, and describe the main contribution of Flajolet in the analysis of the performance and optimization. 75

80 76FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS 2. Telecommunication Protocols, Aloha protocol A protocol is a set of rules that a group of computer must follow in order to achieve a common objective. For instance the common objective is to enable an efficient way of communication between the computers. The question of designing good telecommunication protocols arisen in the 60 s when the computers started to be organized in networks. In the mid 60 s the university of Hawaii dispatched on several island proposed to use radio links to connect the computers and to send information in packets. The main issue in such networks is that when two computers broadcast their packets at the same time then the data collide. In short a collision makes the packet content to be unintelligible and the data to be lost. There were several proposals to design collision free protocols, but all turned to need to have a centralized control which is not practical in a distributed context. To be fully distributed the schemes needed to solve a chicken egg problems. In 1968 Abramson [1] had the luminous idea that blew away the vicious circle and made the foundation of the Aloha protocol: "instead of avoiding collision, why not just try to resolve them afterward". This idea has a corollary: "when you have packet to send, just transmit it when you want". The collision resolution algorithm of the Aloha protocol (literally "welcome" in Hawaiian) is the simplest of all: "computers randomly schedule the retransmissions of their collided packets". This algorithm, confounding in its simplicity, is indeed the foundation of all modern protocols, such as Ethernet, WiFi. Before going further let s introduce the basic model of collision resolution algorithm. Time is slotted, time unit is the slot. Computers can transmit at most one packet per slot and transmissions occur at beginning of slot. All computers listen at feedback at end of slots (there are alternative models with delayed feedback, for example in cable TV networks or satellite uplinks [15]). At every slot: if no contenders: empty slot; if only one transmitter: successful slot; if two or more contenders: collision slot. The detection of collision give rise to a great number of variation: it can either rely on an absence of acknowledgement from the intended receiver as with Aloha or WiFi, or by a violation of code or energy level as with Ethernet (IEEE [14]) Performance of Aloha collision resolution algorithm and the contribution of Philippe Flajolet. Let s illustrate the above model with the Aloha algorithm. We assume that new packets that are ready for transmission are actually generated according to a Poisson process of rate λ per slot. The fact that it is a Poisson process is the usual way to model telecommunication traffics. Indeed such traffics likely result as the superposition of merely independent and mostly unpredictable processes. We assume that a computer that has a packet for retransmission, will reschedule a retransmission on the current slot is p. This is the simplest rescheduling process, being memoryless, in this case retransmission backoff times follow a geometric distribution. Assume that there are n packets waiting for retransmission then the probability that

81 3. THE TREE COLLISION RESOLUTION ALGORITHM 77 the current slot is an empty slot is P (empty) = e λ (1 p) n. The above identity reads the following way: e λ is the probability that no new packet is generated for this slot and (1 p) n is the probability that no waiting packet are not rescheduled on this slot. With the same way of reasoning we also get P (success) = λe λ (1 p) n + e λ np(1 p) n 1, which reads exactly as follows: either the unique packet transmitted on this slot is a new packet (with probability λe λ ) and no waiting packet is rescheduled, or no new packet is generated and only one waiting packet is scheduled (probability np(1 p) n 1 ). Given the above estimate it is clear that P (collision) 1 when n. Also P (success) 0. Therefore the output of the system decays to zero when the number of backlogged packets tends to infinity. If N(t) denotes the number of backlogged packet at time t, then we have E(N(t + 1) N(t) N(t) = n) = λ P (success) The average discrepancy lim sup n (N(t + 1) N(t)) > 0 as soon as λ > 0. From this property Fayolle et al. [2] proved via probabilistic methods that E(N(t)) when t. In other words the collision resolution of the Aloha protocol as simple it is, is unfortunately unstable. The news is not so bad because there is a simple way to stabilize the Aloha protocol. The value of p which maximizes the quantity P (success) is in fact p = 1 λ N(t) and in this case lim n P (success) = e 1. With this condition the constrained Aloha protocol is stable as long as λ < e 1. Thus the maximum attainable throughput λ max, with constrained Aloha as a stable system satisfies: λ max (constrained aloha) = e 1. This number looks like a magic number, the resolution algorithm with the extra cost of empty slots and collision slots, spares a proportion of e 1 slot for successful communication. The main difficulty is how to get a proper estimate of the number N(t) at each time slot in a distributed way. It turns out that it is not needed to have an exact estimate of the quantity N(t), but an approximation that converges to the actual value when n increases. Many solutions based on Bayesian methods have been introduced. Philippe Flajolet et al. [6] proposed a much simpler method based on leader election that allows to stabilize the Aloha algorithm for the magical number. 3. The tree collision resolution algorithm 3.1. Back to the origins. Stabilizing the Aloha protocol consists into introducing non trivial rules that somewhat alleviate the apparent original simplicity of the

78FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS {A,B,C,D} {A,B} {C,D} {A,B} {C} {D} {A} {B} Figure 11

82 78FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS {A,B,C,D} {A,B} {C,D} {A,B} {C} {D} {A} {B} Figure A tree collision resolution of an initial collision of four nodes: {A, B, C, D} scheme. Therefore there were a room for defining a simple and stable collision resolution algorithm. This was done by Capetanakis [3] and Vvedenskaya, Tsybakov [4]. In 1979, Capetanakis proposed a scheme based on a recursive tree structure. After each collision a rooted binary tree is created. Every contender independently tosses a coin: if head", then the contender enters the first subtree; if "tail", then the contender waits for the resolution of the first subtree and enters the second subtree. Figure 1 illustrates the tree resolution of a subset of four contenders {A, B, C, D}. To find the succession of slots one must read the tree left depth first. In its first version, Capetanakis supposed that a central entity indicates to the contenders the current node in the tree. This looks again like an unacceptable complication for a distributed algorithm. In fact it turns out via the work of Vvedenskaya and Tsybakov that no such central entity is needed. Indeed the later authors introduced in 1980 [4] the stack algorithm described as follow. After each collision in the stack valgorithm, every contenders initializes a counter C(t) and tosses a coin: if "head" then C(t) = 0; if "tail" then C(t) = 1. The C(t) evolves the following way

83 3. THE TREE COLLISION RESOLUTION ALGORITHM 79 If C(t) = 0 then the computer transmits on the next slot; If C(t) > 0 and the next slot is a collision, then C(t + 1) = C(t) + 1; If C(t) > 0 and the next slot is not a collision, then C(t + 1) = C(t) 1. Figure 2 illustrates the collision resolution of the stack algorithm of the subset {A, B, C, D}. Notice that with a same sequence of coin tossing for each contender, the stack algorithm resolves collision with the same sequence of slots as with the tree algorithm. This analogy is not fortuitous: the stack algorithm is exactly a distributed implementation of the tree algorithm. Stack and tree resolution are the same algorithm. t slot feedback coll coll empty coll succ succ C(t) = 0 {A, B, C, D} {A, B} {A, B} {A} {B} C(t) = 1 - {C, D} {A, B} {C, D} {B} {C, D} C(t) = {C, D} - {C, D} - t slot feedback coll succ succ C(t) = 0 {C, D} {C} {D} C(t) = 1 - {D} - C(t) = Figure A stack resolution 3.2. A tree resolution is a trie. Philippe Flajolet made a spectacular breakthrough in the analysis of the performance of the tree algorithm for a very good reason: a tree resolution is exactly a trie. Assume that you identify each contender by its sequence of coin tossing. This sequence is a binary key. Take the contender A, its binary key is (0, 1, 0,...). It turns out that this binary key is exactly the access path to the leaf that contains the successful transmission of contender A. In other words the tree resolution is exactly the trie of the initial contenders identified by their coin tossing sequence. Assuming that no new packet is generated during the resolution the average number of slots needed in a collision resolution of n contenders, is exactly the size of a binary trie with n records. Let L n be this average size, we have L 0 = L 1 = 1. Since Knuth we know that for n > 1: k=n L n = 1 + k=0 2 n ( n k ) (L k + L n k ).

84 80FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS Introducing the Poisson generating function, L(z) = n L n zn n! e z Flajolet obtains the functional equation: L(z) = 1 2(1 + z)e z + 2L( z 2 ). Using the Mellin transform L (s) = 0 L(x)x s 1 dx, defined for 2 < R(s) < 1, Flajolet [?] gives a closed form expression L (s) = 2(1 + s)γ(s) 1 2 s. And it turns out via the singularity analysis of the inverse Mellin transform L(z) = 1 2iπ R(s)=c L (s)z s ds that occur for c = 1 (see [13] for the technical details): ( L(z) = 2z 1 + ) s k Γ(s k 1)z s k + o(z). log 2 k Z with s k = 2ikπ log 2. Noticing that the contribution of the s k leads to an periodic term (in log z) of amplitude not more than the order 10 6 we have z L(z) = (1 + O(10 6 )). Flajolet could offer the luxury to extend the analysis to the case where the coin toss is biaised by the probabilities (p, q), with p + q = 1: and in this case we get L(z) = 1 2(1 + z) + L(pz) + L(qz), L(z) = 2z p log p q log q (1 + r 2(log z)) the remaining term either o(1) (in general) or, when log p log q is rational, function r 2(x) periodic with amplitude within It turns that the result can also be used to characterize the individual coefficients L n = L(n) + o(n) by a direct application of depoissonization theorems (unknown at this time) Performance analysis of the tree algorithm with blocked access. The above estimates apply for the specific case when no new packets participate to the resolution of an initial collision. We call this version the "blocked access" tree algorithm. The time is divided in epoch. To detect that the collision is resolved and a that a new collision can start, one just need that each computer manage a counter K(t) as follows: when the next slot is a collision then K(t + 1) = K(t) + 1; otherwise K(t + 1) = max{0, K(t) 1}. When K(t) = 0 then the computers transmit the new packets that have been generated during the last epoch. Let l(i) and N(i) respectively denote the length of ith epoch and the number of new packet generated during this epoch. The variable N(i) conditioned

85 3. THE TREE COLLISION RESOLUTION ALGORITHM 81 by l(i) is Poisson variable of mean λl(i). In other words P (N(i) = n l(i) = l) = (λl) n n! e λl. Thus we have the exact identity: E(l(i + 1) l(i) = l) = n From the asymptotic analysis it turns out that lim l (λl) n L n e λl = L(λl). n! + 1) L(λl) E(l(i l(i) = l) = lim > 1 l(i) l l when λ > 2 log 2 (1 + O(10 6 )). In this case the quantity l(i) tends to diverge when i, therefore the tree algorithm is unstable. Conversely when λ < 2 log 2 (1 O(10 6 )). lim l + 1) E(l(i l(i) = l) < 1 l(i) and the quantity l(i) remains finite in average when i and the tree algorithm is stable. In other words the tree algorithm with blocked access is stable and its maximum achievable throughput satisfies the inequality 2 log 2 (1 O(10 6 )) < λ max (binary tree) < 2 log 2 (1 + O(10 6 )). There is no exact evaluation ( of the quantity λ max (binary tree) because of the oscillating terms of order Γ 2iπ log 2) = < Thus the estimate λ max = ± 10 6 is valid. When log p log q is irrational, the quantity λ max is exactly 2 p log p q log q but in this case the value is smaller than λ max in the unbiaised case. And in any case it is smaller than the e 1 of constrained Aloha Q-ary tree algorithm. An obvious question: can we adapt the tree algorithm to Q-ary trees? The answer is yes, see 3, the idea came to Flajolet Mathys [12]; the collision algorithm requires a Q-sided coin toss. After each collision the contenders set C(t) as a random integer between 0 and Q 1 and let it evolve as follow: If C(t) = 0 then the computer transmits on the next slot; If C(t) > 0 and the next slot is a collision, then C(t + 1) = C(t) + Q 1; If C(t) > 0 and the next slot is not a collision, then C(t + 1) = C(t) 1. The function L(z) now satisfies the general equation i=q 1 L(z) = 1 Q(1 + z)e z + L(p i z), where the p i is the probability that the coin tossing produces the integer i after a collision. Or otherwise: L (s) = i=0 Q 1 (1 + s)γ(s), i p s i

82FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS {A,B,C,D} {A,B} {C} {D} {A} {B} Figure 11.3.

86 82FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS {A,B,C,D} {A,B} {C} {D} {A} {B} Figure A 3-ary tree collision resolution of an initial collision of four nodes: {A, B, C, D} which leads to the asymptotic estimate formally similar to the binary case: L(z) = Qz i p i log p i (1 + r Q (log z)). where in general the r Q (x) is o(1) when x or is periodic when the log p i s are commensurable. However the amplitude is still of very small order, although slowly increasing with Q. Similarly with the binary case, for a given value Q the minimum is attained for the unbiaised case: p i = 1 Q and the error term ɛ(q) is of ( ) order Γ exp( π 2 ). Thus 2iπ log Q log Q λ max (Q-ary tree) = log Q (1 + ɛ(q)), Q with ɛ(q) exp( π2 log Q ). If the quantity Q were not constrained to the integer values, then the maximum value of λ max (Q-ary tree) would be e 1 attained for Q = e. In fact the maximum value for integer Q is 3, the closest integer to e, and λ max (ternary tree) = which falls very short of e 1 = by very few digits. Notice that within at least six digits we have λ max (quaternary tree) = λ max (binary tree) and beyond λ max (Q-ary tree) is a decreasing function of the integer Q.

87 4. THE FREE ACCESS TREE ALGORITHM 83 The following table in figure 4, borrowed from [12], summarizes the different values of λ max when Q varies in the uniform coin tossing case. Q λ max ± ± ± ± ± Figure Different values of λ max versus Q uniform coin tossing case. blocked access mode The question remains: can we have a λ max larger than e 1, or is this last quantity an ultimate limit. For long this was indeed a conjecture, and we will see how the works of Philippe Flajolet answer that e 1 is not an ultimate limit. 4. The free access tree algorithm The previous analysis was made under the implicit hypothesis that the tree algorithm works in blocked access mode. The free access mode exists as well, in this case any newly generated packet is tentatively transmitted in the very next slot of its generation time. The consequence is that new generated packets can participate to the current collision resolution, thus there is no need to manage a global counter K(t). Despite this sensible difference, the algorithm remains the same, i.e. after each collision contenders initialize a counter C(t) which evolves exactly as before Resolution of free access recursion. The analogy with the trie analysis is perfect as long as the new packets have a blocked access. If we leave a free access to new packet, i.e. they are transmitted on the first current slot, then the analysis significantly departs from the trie orthodoxy. Philippe Flajolet developped the analysis as well [9].Although more complicated it can be tackled with some elegance via generating functions. If x and y indicates the number of new arrivals on the slots starting the left and right subtree, for n > 1 the situation is illustrated by figure 5: L n = 1 + n ( ) n P (x)p (y) 2 n (L k+x + L n k+y ). k x,y k=0 For the analogy with trie it would correspond to the possibility of new insertions via internal node, instead of the usual root insertion. The packet arrival being Poisson of mean λ then P (x) = λx x! e λ, the generating function L(z) considerably simplifies this equation: L(z) = 1 2L(λ)(1 + z)e z L (λ)ze z + 2L( z 2 + λ).

84FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS Figure 11.5.

88 84FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS Figure Free arrival collision resolution When the coin tossings are biaised we have a sensibly similar equation: L(z) = 1 2L(λ)(1 + z)e z L (λ)ze z + L(pz + λ) + L(qz + λ). How to solve this equation? Let s rewrite it in (I H)L(z) = 1 2L(λ)f(z) L (λ)g(z), with f(z) = (1 + z)e z, g(z) = ze z and H is the linear operator defined on analytical function φ(z): Hφ(z) = φ(pz + λ) + φ(qz + λ). Operator I is the identity operator. Inverting I H by the classic formula (I H) 1 = I+H+H unfortunately this operation does not work because of a diverging series. However the resolution comes from from a similar but convergent series with L(z) = 1 2L(λ)Tf(z) L (λ)tg(z), with the definition of the operator T on any analytical function φ(z): Tφ(z) = k 0 H k φ(z) H k φ(0) z(h k φ) (0) This series converge because if one take the second derivative of L(z) we have the identity (I H )L (z) = 2L(λ)f (z) L (λ)g (z) with H φ(z) = p 2 φ(pz + λ) + q 2 φ(qz + λ). It turns out that the iteration (I H ) = I + H + (H ) 2 + converges under mild conditions since p 2 + q 2 < 1. Therefore L (z) = 2L(λ) (H ) k f (z) L (λ) k 0 k 0(H ) k f (z). Using the identity L(z) = 1+ z y 0 0 L (x)dxdy (no z term) and the fact that z y 0 0 (H ) k φ (x)dxdy = H k φ(z) H k φ(z) z(h k φ) (0) leads to the claimed result.

89 4. THE FREE ACCESS TREE ALGORITHM 85 It remains to determine L(λ) and L (λ) this is done by simple identification in: 1 L(λ) = 1 + 2Tf(λ) Tg(λ) 1 L (λ) 2(Tf) (λ) 1 + (Tg) (λ) 0 In passing (Tφ) (z) = k 0 (H ) k φ (z) (H ) k φ (0). It turns out that the vector L(λ) is determined as long as the above matrix is not singular, i.e. as long as L (λ) D(λ) = (1 + 2Tf(λ))(1 + (Tg) (λ)) 2Tg(λ)(Tf) (λ) 0, If this condition is fulfilled we have L(λ) = 1 + (Tg) (λ) D(λ) Flajolet could also give the asymptotic estimate of L n when n (it turns out to be linear in n with some potential periodic terms), but this tour de force was not really necessary for the estimation of the λ max of the free access tree algorithm. Indeed L(λ) is exactly the average duration of a collision resolution interval with a initial collision of Poisson multiplicity. If the average L(λ) < 0 then the system having ergodic renewal times, is therefore stable. Therefore λ max is the first value of λ that makes L(λ) diverging, ie. λ max is the first non negative root of D(λ): D(λ max ) = 0. Therefore the quantity max as root of a purely analytic function is known up to an arbitrary accuracy, contrary to the blocked access case: λ max (free binary tree) = Notice that although being significantly larger than the λ max (binary tree) it is smaller than the λ max (ternary tree). We will return to this comment when we will discuss the free Q-ary tree algorithms. The above methodology can be applied to other parameters of interest. For example let C n be the average number of contenders in the resolution of of collision of initial multiplicity n. We have C 0 = 0 and C 1 = 1 but when n > 1 we have C n > n. In fact: C n = n ( ) n P (x)p (y) 2 n (C k+x + C n k+y ), k x,y k=0 which after some algebra translates into the functional equation: (I H)C(z) = g(z) 2C(λ)f(z) C (λ)g(z), with C(z) = n C n zn n! e z. Since C(z) = z + z y 0 0 C (x)dxdy it resolves into C(z) = z + Tg(z) 2C(λ)Tf(z) C (λ)tg(z)

90 86FLAJOLET S WORK ON TELECOMMUNICATION PROTOCOLS AND COLLISION RESOLUTION AGORITHMS and C(λ) C (λ) = 1 + 2Tf(λ) 2(Tf) (λ) Tg(λ) 1 + (Tg) (λ) 1 λ + Tg(λ) 1 + (Tg) (λ) Thus C(λ) = 1 D(λ) (1 + (Tg) (λ))(λ + Tg(λ) Tg(λ)) = λl(λ). Notice that this identity was predictable since L(λ) is precisely the average renewal period. The average number of packets per renewal period should be λl(λ) in order to keep the arrival rate equal to λ. We will see in the following section that the methodology can be extended to some non trivial case that give insight to the performance of the tree algorithm as a telecommunication system Delivery delay analysis of the free tree algorithm. After having determined the condition of stability of a communication system, the telecom engineers are primarily interested into the packet delivery delay. The delivery delay is the time that separate the generation of the packet and its successful reception by its intended receiver. The system can be stable in general but could be prone to unacceptable large delivery delay. Therefore the analysis of the delivery delay is a key point and this was done in [?] noticeable in the literature as the first full analysis of the performance of a collision resolution algorithm. Let W n be the average cumulated delays of packets participating to the resolution collision of initial multiplicity n. Clearly W 0 = 0, W 1 = 1. When n > 1, as illustrated in figure 6 we have the recursion W n = n + n ( ) n P (x)p (y) 2 n (W k+x + (n k)l k+x + W n k+y ). k x,y k=0 As an obvious operation we introduce the exponential generating function W (z) = n W n zn n! e z which satisfies the functional equation: (1) (I H)W (z) = z + qzl(pz + λ) ql(λ)g(z) 2W (λ)f(z) W (λ)g(z) The key of the analysis is in the remark that the collision resolution intervals form a renewal sequence and that W (λ) is in fact the average cumulated delays per resolution intervals. Since C(λ) = λl(λ) is the average number of packets involved per collision resolution interval, thus the unconditional average packet delay is W (λ) C(λ). Using the functional resolution methodology, we get from equation (1) the expression W (z) = z + qth(z) ql(λ)tg(z) 2W (λ)tf(z) W (λ)tg(z),

5. Q-ARY FREE ACCESS TREE ALGORITHM 87 Figure 11.6. The cumulated delays in a collision resolution with h(z) = zl(pz + λ).

91 5. Q-ARY FREE ACCESS TREE ALGORITHM 87 Figure The cumulated delays in a collision resolution with h(z) = zl(pz + λ). Thus W (λ) W (λ) = 1 + 2Tf(λ) 2(Tf) (λ) and Tg(λ) 1 + (Tg) (λ) 1 λ + qth(λ) 1 + q(th) (λ) W (λ) = (1 + (Tg) (λ))(λ + qth(λ) Tg(λ)(1 + q(th) (λ)) D(λ). 5. Q-ary free access tree algorithm A last and lancinant question remains: is e 1 an actual upper bound of all possible maximum throughput attainable by a collision resolution algorithm? Flajolet with Mathys [?] was able to give an original answer to this question. The answer is in the extension of the free access algorithm to Q-ary free access tree algorithm. To make the analysis fast, the generating function L(z) now satisfies: i=q 1 L(z) = 1 QL(λ)f(z) L (λ) + L(p i z + λ), or (I H)L(z) = 1 QL(λ)f(z) L (λ)g(z), by appropriately redefining the operator H: Hφ(z) = i=q 1 i=0 φ(p i z + λ). This redefinition implies a redefinition of operator T and using the same argument as for the binary case, we determine λ max as the root of D(λ): D(λ) = 0 with i=0 D(λ) = (1 + QTf(λ))(1 + (Tg) (λ)) QTg(λ)(Tf) (λ). It turns out that the maximum values of λ max are obtained when the coin tossing are uniform, i.e. when i p i = 1 Q. The maximum value is obtained with Q = 3 with λ max = which is larger than e 1 of 10%. This way Flajolet could disprove the conjecture that e 1 would be an absolute barrier of stability conditions for collision resolution algorithms in basic slotted models.

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint