LIMITING DISTRIBUTIONS FOR THE NUMBER OF INVERSIONS IN LABELLED TREE FAMILIES

LIMITING DISTRIBUTIONS FOR THE NUMBER OF INVERSIONS IN LABELLED TREE FAMILIES ALOIS PANHOLZER AND GEORG SEITZ Abstract. We consider so-called simple families of labelled trees, which contain, e.g., ordered, unordered, binary and cyclic labelled trees as special instances, and study the global and local behaviour of the number of inversions. In particular we obtain limiting distribution results for the total number of inversions as well as the number of inversions induced by the node labelled j in a random tree of size n.. Introduction Throughout this paper we always consider rooted trees T in which the vertices are labelled with distinct integers of {,..., T }, where T is the size i.e., the number of vertices of T. An inversion in a tree T is a pair i, j of vertices we may always identify a vertex with its label, such that i > j and i lies on the unique path from the root node roott of T to j thus i is an ascendant of j or, equivalently, j is a descendant of i. Let us denote by invt the number of inversions in T. In [6, ] studies concerning the number of inversions in some important combinatorial tree families T have been given by introducing so-called tree inversion polynomials. They shall be defined as follows : J n q := q invt. T T : T =n Actually, unlike in our studies, in [6, ] the authors exclusively considered trees with the root node labelled. Thus, in order to avoid confusion, we introduce also the slightly modified polynomials Ĵnq := T T : q invt. For unordered trees, i.e., trees, where one T =n and roott = assumes that to each vertex there is attached a possibly empty set of children thus there is no left-to-right ordering of the children of any node, Mallows and Riordan [] could give an explicit formula for a suitable generating function of the corresponding tree inversion polynomials: exp q n Ĵ n q tn = q n tn n! n!. n n 0 Gessel et al. [6] considered Ĵnq for three other tree families: Ordered trees: one assumes that to each vertex there is attached a possibly empty sequence of children thus there is a left-to-right ordering of the children of each node. Cyclic trees: ordered trees, where one assumes that cyclic rearrangements of the subtrees of any node give the same tree. Plane trees: ordered trees, where one assumes that cyclic rearrangements of the subtrees of the root node give the same tree. Unlike for unordered trees, no explicit formulæ for a suitable generating function of the tree inversion polynomial of the latter tree families could be given, but the authors provide exact and Date: January 5, 0. 000 Mathematics Subject Classification. 05C05, 60F05, 05A6. Key words and phrases. inversions, simply generated trees, limiting distributions. This work has been supported by the Austrian Science Foundation FWF, grant S9608-N3. Later we consider weighted tree families and introduce extensions of this and further definitions.

asymptotic results for the evaluations of Ĵnq for the specific values q = 0,,. In particular, Ĵ n 0 enumerates so-called increasing trees, i.e., trees, where each child node has a label larger than its parent node. Besides these studies it seems natural to ask, for a given combinatorial family T of trees, questions about the typical behaviour of the number of inversions in a tree T T of size n. In a probabilistic setting we may introduce a random variable I n, which counts the number of inversions of a random tree of size n, i.e., a tree chosen uniformly at random from all trees of the family T of size n. Of course, this more probabilistic point of view and the before-mentioned combinatorial approach are closely related. Let us denote by T n the number of trees of T of size n. Then it holds J n q = T n P{I n = k}q k, k 0 i.e., the probability generating function p n q := k 0 P{I n = k}q k of the random variable I n is simply given by p n q = Jnq T n, and it holds T n,k = T n P{I n = k} for the number T n,k := [q k ]J n q of trees of size n with exactly k inversions. A main concern of this paper is to describe the asymptotic behaviour of the random variable I n for various important tree families by proving limiting distribution results. In our studies of I n we use as tree-models so-called simply generated tree families [5, ], which contain many important combinatorial tree families, such as the before-mentioned unordered trees, ordered trees, and cyclic trees, but also others such as, e.g., binary trees, t-ary trees, and Motzkin trees, as special instances. Simply generated trees are weighted ordered trees, where, given a degreeweight sequence, each node gets a weight according to its out-degree, i.e., the number of its children see Subsection. for a precise definition; we remark that in probability theory such tree models are known as Galton-Watson trees. As a main result we can show, provided the degree-weight sequence satisfies certain mild growth conditions which are all satisfied for the before-mentioned tree families, that, after a suitable normalization of order n 3, I n converges in distribution to a distribution known as the Airy distribution see Subsection 3.. We remark that the Airy distribution also appears in enumerative studies of other combinatorial objects such as, e.g., the area below lattice paths [9], the area of staircase polygons [4], sums of parking functions [7], and the costs of linear probing hashing algorithms [4]. For the particular tree family of unordered trees this limiting distribution result for I n has been shown already by Flajolet et al. in [4] during their analysis of a linear probing hashing algorithm by using close relations between the insertion costs of this algorithm and the number of inversions in unordered trees. We note that we show convergence in distribution, thus obtaining asymptotic results for P{I n xn 3 } or alternatively for the sums k xn 3 T n,k, with x R +, but we do not obtain local limit laws, i.e., results concerning the behaviour of the probabilities P{I n = k} or the numbers T n,k itself. Besides this global study of the number of inversions in a random tree we are additionally interested in the contribution to this quantity induced by a specific label j, i.e., in a local study. To do this we introduce random variables I n,j, which count the number of inversions of the kind i, j, with i > j an ancestor of j, in a random tree of size n. Of course, one could also introduce local inversion polynomials J n,j q := T n k 0 P{I n,j = k}q k. Note that I n = n j= I n,j, but the random variables I n,j are highly dependent. In our studies we describe the asymptotic behaviour of the random variable I n,j, depending on the growth of j = jn with respect to n. In particular, we obtain that for the main portion of labels, i.e., for j n n, I n,j converges, after suitable normalization of order n, in distribution to a Rayleigh distribution. We remark that the Rayleigh distribution also appears frequently when studying combinatorial objects, see, e.g., [5]. If n j ρ n or n j = o n then the behaviour changes. Apart from asymptotic results, we can for two particular tree families, namely ordered and unordered trees, also give explicit formulæ for the probabilities P{I n,j = k}. An example of a labelled tree and the parameters studied is given in Figure

3 7 6 5 4 Figure. A binary labelled tree of size 7 with a total number of 6 inversions, namely 3,, 6,, 3,, 6,, 6, 4, 7, 5. Thus two inversions each are induced by the nodes and, whereas one inversion each is induced by the nodes 4 and 5. We remark that the asymptotic results obtained for inversions in trees are completely different from the corresponding ones for permutations of a set {,,..., n}. It is well-known see, e.g., [0] that the total number of inversions in a random permutation of size n is asymptotically normal distributed with expectation and variance of order n and n 3, respectively. Trivially, the number of inversions of the kind i, j, with i > j an element to the left of j, in a random permutation of size n is uniformly distributed on {0,,..., n j}. The plan of this paper is as follows. In Section we collect definitions and known results about simply generated tree families, whereas in Section 3 we state the main results of this paper concerning the random variables I n and I n,j. A proof of the results for I n and I n,j is given in Section 4 and Section 5, respectively. Before continuing we define some notation used throughout this paper. The operator [z n ] extracts the coefficient of z n from a power series Az = n 0 a nz n, i.e., [z n ]Az = a n. For s N 0, x s resp. x s denotes the s-th falling resp. rising factorial of x, i.e., x 0 = x 0 =, and x s = xx x s +, x s = xx + x + s, for s. Furthermore, for each variable x we denote the differential operator with respect to x by D x, and we define two operators V = V q and Z, which act on bivariate power series Gz, q by VGz, q := Gz,, and ZGz, q := zgz, q analogous definitions for multivariate power series. Moreover, if X d and X n, n, are random variables, then X n X denotes the weak convergence i.e., the convergence in distribution of the sequence X n n to X.. Labelled families of simply generated trees and auxiliary results Families of simply generated trees were introduced by Meir and Moon in []. As mentioned before, many important combinatorial tree families such as, e.g., labelled unordered trees also called Cayley trees, binary trees, labelled cyclic trees also called mobile trees and ordered trees also called planted plane trees, can be considered as special instances of simply generated trees. We now recall how simply generated tree families are defined in the labelled context, and then collect some well-known auxiliary results. Note that in the following the term tree will always denote a labelled tree... Definitions. A class T of labelled simply generated trees is defined in the following way: One chooses a sequence ϕ l l 0 the so-called degree-weight sequence of nonnegative real numbers with ϕ 0 > 0. Using this sequence, the weight wt of each ordered tree i.e., each rooted tree, in which the children of each node are ordered from left to right is defined by wt := v T ϕ dv, where by v T we mean that v is a vertex of T and dv denotes the number of children of v i.e., the out-degree of v. The family T associated to the degree-weight sequence ϕ l l 0 then consists of all trees T or all trees T with wt 0 together with their weights. 3

We let T n := T =n wt denote the the total weight of all trees of size n in T, and define z by T z := T n n n! its exponential generating function. Then it follows that T z satisfies the formal functional equation T z = zϕt z, where the degree-weight generating function ϕt is defined via ϕt := l 0 ϕ lt l. We want to remark that each simply generated tree family T can also be defined by a formal equation of the form T = ϕt, where denotes a node, is the combinatorial product of labelled objects, and ϕt is a certain substituted structure see, e.g., [5]. Hence, the functional equation can be obtained directly from the combinatorial construction of T using the symbolic method cf. [5]. Furthermore, T n n is for many important simply generated tree families a sequence of natural numbers, and then the total weight T n can be interpreted as the number of trees of size n in T. We now give several examples where this is the case. Examples: Binary trees can be defined combinatorially as follows: T = { } T { } T. Here, denotes an empty subtree and is the disjoint union. This formal equation expresses that each binary tree consists of a root node and a left and a right subtree, each of which is either a binary tree or empty. The formal equation for T can directly be translated into a functional equation for T z, namely T z = z + T z. Hence, binary trees are the simply generated tree family defined by ϕt = + t, i.e., by the degree-weight sequence ϕ l = l, l 0. Ordered trees are rooted trees, in which the children of each node are ordered. Thus, combinatorially speaking, each ordered tree consists of a root node and a sequence of ordered trees, T = SeqT = { } T T T 3.... From this one gets the functional equation z T z = T z, i.e., ϕt = t. Of course, this corresponds to the degree-weight sequence ϕ l =, l 0. Unordered trees are rooted trees in which there is no order on the children of any node. Hence, each unordered tree consists of a root node and a set of unordered trees, which can be written formally as T = SetT = { } T T! This leads to the functional equation T z = z expt z, T 3 3!.... i.e., one has ϕt = expt, or equivalently ϕ l = /l!, l 0. Cyclic trees may be considered as equivalence classes of ordered trees, where cyclic rearrangements of the subtrees of nodes lead to a tree of the same class. Hence, each cyclic tree is either a single root node or it consists of a root node and a non-empty cycle of unordered trees, which can be written formally as T = CycT = { } CycT. 4

This leads to the functional equation T z = z + log T z, i.e., one has ϕt = + log t, or equivalently ϕ0 = and ϕ l = /l, l. We remark that plane trees as considered in [6] are not covered by the definition of simply generated trees. However, since every subtree of the root node of a plane tree is an ordered tree, the methods applied in this work for a study of the number of inversions can be adapted easily to treat also this tree family, which leads to the same limiting distribution results as for ordered trees. Thus we omit computations for this tree family... Auxiliary results. We now collect some known results see, e.g., [5, 3] on the function T z satisfying. First note that in general T z and ϕt must be regarded as formal power series, because they do not need to have a positive radius of convergence, and then must be understood as a formal equation. Thus, in order to analyze properties of simply generated tree families by analytic methods, we will need to make certain assumptions on ϕ. In particular, we will assume that ϕt has a positive radius of convergence R, and that there exists a minimal positive solution τ < R of the equation If we define tϕ t = ϕt. 3 d := gcd{l : ϕ l > 0}, 4 it then follows that 3 has exactly d solutions of smallest modulus, which are given by τ j = ω j τ, for 0 j d, where ω = exp πi d. From the implicit function theorem it follows that the equation z = t ϕt is not invertible in any neighbourhood of t = τ j, for 0 j d. This leads to d dominant singularities of T z at z = ρ j, where ρ j = ω j ρ, ρ = τ ϕτ. For our purpose, it is important to note that under the above assumptions, T z is amenable to singularity analysis cf. [3], i.e., there are constants η > 0 and 0 < φ < π/ such that T z is analytic in the domain {z C : z < ρ + η, z ρ j, Argz ρ j > φ, for all 0 j d }. The local expansion of T z around the singularity z = ρ j is given by T z = τ j ω j ϕτ ϕ z + O ρ j z. 5 τ ρ j Using singularity analysis and summing up the contributions of the d dominant singularities, one obtains T n n! = d ϕτ [zn ]T z = + O, 6 πϕ τρ n n 3 n for n mod d. If n mod d, one has of course T n = 0, because in this case each ordered tree of size n has weight zero. In our analysis, we will further make use of the functions ϕ m T z where ϕ m t is the m-th derivative of ϕt. Each of these functions has d dominant singularities at z = ρ j, 0 j d, and complies with the requirements for singularity analysis. Around z = ρ j, one has the expansion ϕ m T z = ϕ m τ j ϕ m+ τ j ω j τ ρϕ z + O ρ j z, 7 τ ρ j and we will especially make use of the expansion zϕ T z = ρτϕ τ z + O ρ j z. 8 ρ j 5

3. Parameters studied and results 3.. Parameters studied. Consider a simply generated tree family T associated to a degreeweight sequence ϕ l l 0. In our analysis of parameters in trees of T we will always use the random tree model for weighted trees, i.e., when speaking about a random tree of size n we assume that each tree T in T of size n is chosen with a probability proportional to its weight wt. The main quantities of interest are the random variable I n, which counts the total number of inversions of a random simply generated tree of size n, and the random variable I n,j, which counts the number of inversions of the kind i, j, with i > j an ancestor of j, in a random simply generated tree of size n. We mention the relation to a suitably adapted tree inversion polynomial for weighted tree families: J n q := wt q invt = T n P{I n = k}q k. T T : T =n 3.. Auxiliary results for probability distributions. We collect some basic facts about two important probability distributions appearing later in our analysis. Definition 3.. The Airy distribution is the distribution of a random variable I with r-th moments µ r := E I r = π Γ 3r C r, where the constants C r can inductively be defined by r r C r = 3r 4rC r + j j= k 0 C j C r j, r, C =. 9 Since we will use the method of moments in order to establish our results, the following well-known result about the Airy distribution is important see, e.g., [], where one can find more details about the Airy distribution and some equivalent definitions. Lemma 3.. The Airy distribution is uniquely determined by its sequence of moments µ r r. Definition 3.3. The Rayleigh distribution with parameter σ > 0 is the distribution of a random variable X σ with probability density function f σ x = x x e σ σ, x > 0. 0 The following basic fact about the Rayleigh distribution will be required in our analysis. Lemma 3.4. The Rayleigh distribution is uniquely determined by its sequence of r-th moments µ r r, which are given as follows: µ r := E Xσ r = σ r r r Γ +. 3.3. Results. Let T be the labelled family of simply generated trees associated to a degreeweight sequence ϕ l l 0, where the function ϕt := l 0 ϕ lt l has positive radius of convergence R, and equation 3 has a minimal positive solution τ < R. Furthermore, let ρ = τ ϕτ recall the definitions of Subsection.. Then the following holds: Theorem 3.5 Global behaviour. The random variable I n, which counts the total number of inversions in a random tree of size n of T is, after proper normalization, asymptotically Airy distributed: It holds that E I n c ϕ πn 3/, where c ϕ =, and 8ρτϕ τ I n c ϕ n 3/ where I is an Airy distributed random variable. 6 d I,

Theorem 3.6 Local behaviour. The random variable I n,j, which counts the number of inversions of the kind i, j, with i > j an ancestor of j, in a random tree of size n of T has, depending on the growth of j = jn n, the following asymptotic behaviour. Region n j n: I n,j is, after proper normalization, asymptotically Rayleigh distributed: n n j I d n,j X σ, where X σ is a Rayleigh distributed random variable with parameter σ :=. ρτϕ τ Region n j α n, with α R + : I n,j converges in distribution to a discrete random variable Y γ, with and γ := α. ρτϕ τ P{Y γ = k} = γk k! 0 x k+ e x γx dx, k 0, Region n j n: I n,j converges in distribution to a random variable with all its mass d concentrated at 0, i.e., I n,j 0. 3.4. Examples: Before we prove these results, we apply them to our example tree families: Binary trees: From the equation tt + = tϕ t = ϕt = t + we get the positive solution τ =, and hence c ϕ = and σ =. Thus, if we let I n denote the number of inversions in a random binary tree of size n, then In converges in distribution to n 3/ an Airy distributed random variable. Furthermore, for the number I n,j of inversions in a random binary tree of size n induced by node j, it holds that n n j I n,j converges, for n j n, in distribution to a Rayleigh distributed random variable with parameter. t Ordered trees: The equation = t t yields τ =, and further c ϕ = 4 and σ =. Hence, for the number I n of inversions in a random ordered tree of size n, it holds that 4In is asymptotically Airy distributed. Furthermore, the normalized number of n 3/ inversions induced by node j, is, for n j n, asymptotically Rayleigh n n j I n,j distributed with parameter /. We further note that for ordered trees the exact distribution of I n,j is given as follows for j n and 0 k n j: P{I n,j = k} = n n j n n k l=n j k l n n l n l n j k l k n l. Unordered trees: Here, one has τ = and thus c ϕ = 8 and σ =. This shows that 8In n 3/ n n j I n,j converges in distribution to an Airy distributed random variable and that converges, for n j n, in distribution to a Rayleigh distributed random variable with parameter. Also for unordered trees the exact distribution of I n,j can be stated explicitly. It holds for j n and 0 k n j: n k j!n j! l n l n ln l P{I n,j = k} = n n. 3 n j k k l! l=n j k Cyclic trees: The positive real solution of the equation given by τ 0.6855. One further gets c ϕ = 0.563776. Thus, I n c ϕn 3/ τ 8 t t = + log t is numerically 0.9935 and σ = τ converges in distribution to an Airy distributed random variable, 7

and n n j I n,j converges, for n j n, in distribution to a Rayleigh distributed random variable with parameter σ. 4. Proof of the results concerning the global behaviour 4.. Short overview of the proof. We prove our result given in Theorem 3.5 by using the method of moments, i.e., we show that the moments of I n converge after proper normalization to the moments of the Airy distribution. Since this distribution is uniquely determined by its moments, the convergence result then follows directly from the theorem of Fréchet and Shohat [8]. To start with, we do not study the random variable I n directly, but consider a closely related random variable În. Using the tree decomposition as in [6], we then obtain a q-difference-differential equation for a suitably chosen generating function which encodes the distribution of În. From this equation, we can pump the moments of În using techniques from [4] and singularity analysis, and finally transfer our result to I n. 4.. Introduction of În and generating functions. We let ˆT be the subset of T which consists exactly of those trees in which the root has label. Obviously, the total weight of trees of size n in ˆT is then given by Tn n. Also note that each tree in ˆT has the nice property that the root is not part of any inversion. Hence, the total number of inversions can just be obtained by summing up the contributions of the individual subtrees of the root. This fact will later be useful when we translate a decomposition of the trees in ˆT to generating functions. We let În denote the number of inversions in a random tree of size n in ˆT, where each element of ˆT of size n is chosen with probability proportional to its weight. Furthermore, we introduce the generating function F z, q := n P {În = k k 0 } Tn n qk zn n!. 4 Note that n![z n q k ]F z, q = P {În = k} Tnn is the total weight of all trees of size n in ˆT which contain exactly k inversions. Moreover, observe that VF z, q = F z, is just the exponential generating function of T nn, and hence we have the relation n which we will use frequently. We further introduce the functions ZD z VF z, q = T z, 5 f r z := VD r qf z, q, which are generating functions of the factorial moments E În r of În, in the sense that f r z = Tn z E Îr n n n n!. 6 n Clearly, we can recover the r-th factorial moment of În from 6 by E Îr n = [zn ]f r z [z n ]f 0 z, but as we will see later, it is more convenient to use E Îr n = [zn ]zf rz [z n ]zf 0 z = [zn ]zf rz [z n ]T z, 7 where the second equality follows from 5. 8

4.3. The q-difference-differential equation for F z, q. It turns out that F z, q satisfies a certain equation involving a q-difference operator H which is very similar to the one Flajolet, Poblete and Viola used in [4] in their analysis of linear probing hashing. In our case, we define H by Gz, q Gqz, q HGz, q :=. q Using this, we get: Lemma 4.. The function F z, q defined by 4 satisfies D z F z, q = ϕhf z, q. 8 Proof. This equation can be obtained by establishing mutually dependent recurrences for the total weights T n,k := P {I n = k} T n and ˆT n,k := P {În = k} Tnn of trees of size n with k inversions in T and ˆT, respectively. Nevertheless, we confine ourselves to give a combinatorial argument at this point. In order to derive 8, we establish relations between T and ˆT, which can be translated into functional equations for F z, q = n k 0 ˆT n,k q k zn n! and T z, q := n k 0 T n,k q k zn n!. For this purpose, we consider the sets T n and ˆT n, which contain exactly the trees of size n of T and ˆT, respectively. Clearly, T n can be partitioned into n disjoint subsets T n = ˆT n, T n..., T n n, where T n j contains exactly those trees in which the root is labelled by j. Now consider the bijective mapping between ˆT n and T n which is obtained by just switching the labels and in each tree, and leaving all other labels and the structure of each tree unchanged. Since this mapping does not alter the relative order of any pair of nodes except,, it clearly holds that each tree of ˆT n with k inversions is mapped to a tree of T n with k + inversions. Repeating this argument, we see that each tree in ˆT n with k inversions can bijectively be mapped to a tree with k + j inversions in T n j. This leads for the generating functions F z, q and T z, q to the equation T z, q = ˆT n,k +... + q n q k zn = HF z, q. 9 }{{} n! n k 0 q n q Next, remember that T is defined by the formal equation T = ϕt, and that ˆT consists exactly of those trees of T in which the root has label. It thus follows that ˆT satisfies the formal equation ˆT = O ϕt. 0 Due to the observation that the root node O of any tree in ˆT does not contribute to the number of inversions, equation 0 can be translated by an application of the symbolic method to the differential equation D z F z, q = ϕt z, q. Using equation 9, we thus obtain 8. 4.4. Application of the pumping method. In order to extract expressions for the functions f r z as defined in 6 from 8, we use the pumping method from [4]. This method basically rests on the idea of applying the operator VD r q to the given functional equation involving H, and using a commutation rule for the operators VD r q and H. Since our operator H is slightly different from the one in [4], we will first establish the suitable commutation rule for our case. 9

Lemma 4.. The operator VD j qh satisfies the operator equation VD j qh = i=0 j s=0 j s s + Zs+ D s+ z VD j s q. Proof. Since all occurring operators are linear, it suffices to show that the two sides of the equation coincide when applied to a function of the form Gz, q = q k z n. Remember that Hq k z n = q k + q + + q n z n, and thus we have n n j j VD j qhq k z n = z n VD j qq k q i = z n k j s i s s = z n j = s=0 s=0 j k j s s! s n i=0 i=0 s=0 i s j j k j s ns+ z n j s s + = j = z n s=0 s=0 j s j n k j s s! s s + s + Zs+ D s+ z VD j s q q k z n. Using, we can now establish a recurrence for the derivatives f rz of the factorial moment generating functions: Lemma 4.3. The factorial moment generating functions f r z satisfy, for r, f rz r r = zϕ ϕ T z T z t t + zt+ f t+ r t z + t= r! k! k r! ϕk +...+k r T z k,...,k r B r where B :=, and for r. r m= m! m s=0 m s s + zs+ f s+ B r := { k, k,..., k r N r 0 : k + k +... + r k r = r }, km m s z, Proof. We apply VD r q to 8 and express D r qϕhf z, q using Faá di Bruno s formula for higher derivatives of composite functions, r D r qfgq = k! k r! Dk +...+k r D m q gq km q fgq, m! k,...,k r A r r! where A r := {k, k,..., k r N r 0 : k + k +... + rk r = r}. We then obtain the claimed result by applying Lemma 4., solving for f rz and using the fact that which follows from and 5. m= VϕHF z, q = ϕvhf z, q = ϕzd z VF z, q = ϕt z, 4.5. Singularity analysis. We now investigate the singular behaviour of the functions in in order to compute the factorial moments E Îr n asymptotically. In the following, we carry out only the computations for the case d = where d is defined by 4 and thus gives the number of dominant singularities of the functions considered. The general case runs completely analogous: when applying singularity analysis, one just has to take care of the contributions of all d singularities and add them. 0

In a first step, we want to find an asymptotic formula for the expected value E În. From Lemma 4.3, we have f z = ϕ T z z f 0 z zϕ T z, and using the fact that f 0 z = T zϕ T z = ϕt zϕ T z zϕ T z, which is easily obtained by differentiating and 5, we get f z = zϕ T z T z zϕ T z. Note that zϕ T z for z ρ, which can be seen by differentiating, and thus f z inherits the dominant singularity at z = ρ from T z and ϕ T z. Using the expansions 5 and 8, we find + O ρ z / τ + O ρ z / zf z = ρτϕ τ = 4ρϕ τ z ρ z ρ + O ρ z /, + O By applying basic singularity analysis, this immediately yields [z n ]zf z = + O 4ρ n+ ϕ τ ρ z /, z ρ. n /. Now, using 7 and 6, we get the expected value E În = [zn ]zf z [z n ]T z = π ρ 8ϕτϕ τ n3/ + O = c ϕ πn 3/ + O n /, n / where c ϕ is defined as in Theorem 3.5. We will now consider f rz for general r. It turns out that all f rz have a unique dominant singularity at z = ρ. The singular expansions around this point are given in the following lemma. Lemma 4.4. For r, each f rz has a unique dominant singularity at z = ρ, where the expansion zf rz = c r ϕτ C r ϕ ϕ τ 3r / + O ρ z /, z ρ, 4 z ρ holds. Here, the constants C r are defined as in 9. Proof. One easily checks that in the case r = equation 4 coincides with 3. For r > we proceed by induction, following the inductive definition 9 of the constants C r. So let r > and assume that 4 holds for all functions f j z with j < r. By the rules for singular differentiation [] we then also have the following singular expansions for the k-th derivatives f k j of the functions f j z, for all k and j < r: k 3j zf k j z = c r ϕτ C j ϕ ϕ τ 3j 3+k/ + O ρ k z ρ 3 ρ z /, z ρ. 5

From this one concludes that the dominant contributions in can only arise from the terms corresponding to t = and s = 0, i.e., zf rz z = zϕ ϕ T z r T z z f r z + r! k! k r! ϕk +...+k r T z k,...,k r B r Now note that r m= km m! zf mz = O r m= zf m z m! ρ z 3r k +...+k r / km + O ρ z /. hence the dominant terms in the remaining sum correspond to those k,..., k r B r with k +... + k r =, and we thus get zf rz z = zϕ ϕ T z r T z z f r z r r! + ϕ T zz f sz f r sz + O ρ z /. s! r s! s= Now, expanding the occurring functions using 7, 8 and 5, we obtain after some simplifications zf rz ρ = ρτϕ τ r ϕτ C r 3r z cr ϕ ϕ τ 3r +/ ρ ρ z ρ + r r ϕτ ϕ τc s s ϕc r s C s C r s ϕ ϕ τ 3s /+3r s / s= z ρ + O ρ z / = c r ϕ ϕτ ϕ τ = c r ϕτ ϕ ϕ τ 3r 4rC r + r r s= s Cs C r s 3r / + O z ρ z ρ C r 3r / + O, ρ z /, z ρ. ρ z /, Lemma 4.4 can now be used in order to compute the moments of În asymptotically: Lemma 4.5. The random variable În satisfies E Îr n = πc r ϕn 3r/ Γ 3r C r + O n /. Proof. By singularity analysis, it follows from Lemma 4.4 that [z n ]zf rz = c r ϕτ ϕ ϕ τ n 3r Γ 3r C r and together with 7 and 6 this shows E Îr n = [zn ]zf rz [z n ]T z = πc r ϕ n 3r/ Γ 3r + O n /, C r + O n /.

Using the relation between the factorial moments and the ordinary moments of a random variable Y : r { } r EY r = EY l, 6 l l=0 with { } r l the Stirling numbers of second kind, we obtain that E Îr n = E Îr Îr n + O E n, and hence we get the desired result. 4.6. Transfer of the result to I n. We now transfer the result for În to the random variable I n, which counts the total number of inversions in a random tree of size n of T. In fact, we prove that the moments of În and I n coincide asymptotically: Lemma 4.6. The random variable I n satisfies E In r = πc r ϕn 3r/ Γ 3r C r + O n /. 7 Proof. The relation between T and ˆT compare equation 9 directly translates to the following relation between the moments of I n and În: E Îr n + E În + r +... + E În + n r. E In r = n From this, one easily deduces that E In r = E Îr n + O from Lemma 4.5. n 3r, and hence 7 follows directly By comparing 7 with µ r in Definition 3., we conclude that the moments of the normalized I random variable n converge to the moments of the Airy distribution. Due to Lemma 3., c ϕn 3/ the convergence in distribution of to an Airy distributed random variable thus follows I n c ϕn 3/ directly from the theorem of Fréchet and Shohat. This finishes the proof of our result on the total number of inversions. 5. Proof of the results concerning the local behaviour 5.. The generating functions approach. A main ingredient in the proof of Theorem 3.6 concerning the behaviour of the random variable I n,j is to introduce and study a suitable generating function for the probabilities P{I n,j = k}, which reflects in a simple way the recursive description of a tree as a root node and its subtrees. It turns out that the following trivariate generating function is appropriate: Nz, u, q := z j u m P{I m+j,j = k}t m+j j! m! qk. 8 m 0 j k 0 Proposition 5.. The generating function Nz, u, q is given by the following explicit formula: Nz, u, q = Proof. We will show the functional equation ϕt z + u z + uqϕ T z + u. Nz, u, q = ϕt z + u + zϕ T z + unz, u, q + uqϕ T z + unz, u, q, 9 which is equivalent to the statement of Proposition 5.. To do this we introduce specifically tricoloured trees: in each tree T T exactly one node is coloured red, all nodes with a label smaller than the red node are coloured white, whereas all nodes with a label larger than the red node are coloured black. Let us denote by T C the family of all such tricoloured trees. Then in 3

the generating function Nz, u, q the variable z encodes the white nodes, the variable u encodes the black nodes, whereas q encodes the black ancestors of the red node, i.e., Nz, u, q = wt C z white u black white! black! q black ancestors of red. T C T C Since the black nodes as well as the white nodes are labelled it is appropriate to use a double exponential generating function. As auxiliary family we consider specifically bilabelled trees: the nodes in each tree T T are coloured black and white in a way such that each white node has a label smaller than any black node i.e., all nodes up to a certain label are coloured white, whereas all remaining nodes are coloured black. Let us denote by T B the set of all such bicoloured trees. The double exponential generating function of bicoloured trees, can be computed easily. It holds: Bz, u = n 0 T T : T =n = n 0 n T n m=0 Bz, u = wt B z white u black white! black!, T B T B wt n m=0 z n m u m n m! m! = n 0 n m=0 z n m u m n m! m! T T : T =n wt z n m u m n m! m! = T n n! z + un = T z + u. 30 n 0 Now we consider the decomposition of a tricoloured tree T C T C into the root node roott C and its l 0 subtrees T,..., T l. Thus the degree-weight of the root node is given by ϕ l. Three cases may occur. i The root node is the red node. Then the red node does not have black ancestors and all of the subtrees T,..., T l are, after order preserving relabellings, specifically bicoloured trees, i.e., elements of T B. ii The root node is a white node. Then the red node is contained in one of the l subtrees; let us assume it is T s. After an order preserving relabelling this subtree is itself an element of T C, whereas all remaining subtrees are, after order preserving relabellings, elements of T B. Moreover, the number of black ancestors of the red node in T C is the same as the number of black ancestors of the red node in the subtree T s. iii The root node is a black node. Again the red node is contained in of the l subtrees; let us assume it is T s. After an order preserving relabelling this subtree is an element of T C, whereas all remaining subtrees are, after order preserving relabellings, elements of T B. But in this case the number of black ancestors of the red node in T C is one more than the number of black ancestors of the red node in the subtree T s. Considering all tricoloured trees of T C and taking into account 30 the above decomposition leads to the stated equation 9 for Nz, u, q: Nz, u, q = l 0 l l Nz, ϕ l T z + u + z lϕ l T z + u u, q + uq l 0 l 0 lϕ l T z + u l Nz, u, q = ϕt z + u + zϕ T z + unz, u, q + uqϕ T z + unz, u, q. 4

5.. Computing the factorial moments. Starting with the explicit formula for the trivariate generating function Nz, u, q given in Proposition 5. we will compute the r-th factorial moments of I n,j. According to the definition 8 of Nz, u, q one obtains: EI r j!n j! n,j = [z j u n j ]VD T qnz, r u, q n j! n j! r! ϕ = [z j u n j r T z + u r ϕt z + u ] T n z + uϕ T z + u r+. Since for any power series gx it holds: a + b [z a u b ]gz + u = [z a+b ]gz, a one further obtains the following expression, which will be the starting point for our asymptotic considerations: EI r j! n j! r! n r ϕ n,j = [z n r T z r ϕt z ] T n j zϕ T z r+ j! n j! r! n r zϕ = [z n T z r T z ] T n j zϕ T z r+. 3 In order to evaluate EI r n,j asymptotically we use the local expansions 5 and 8 and apply singularity analysis. Again for simplicity in presentation we will only carry out the computations for the case that the functions involved have d = dominant singularities see Subsection.; for d > one just has to add the contributions of all these singularities. We obtain then for r arbitrary, but fixed: zϕ [z n T z r T z + O z r ] zϕ T z r+ = ρ τ + O z ρ [zn ] ρτϕ τ z ρ + O z ρ r+ = [z n ] τ ρτϕ τ r+ z ρ r+ + O z ρ = τ n r + O n ρτϕ τ r+. ρ n Γ r+ Together with the asymptotic formula for T n given in 6, we obtain from 3 after simple computations: EI r n,j = n j!r!n r! πϕ τn 3 τn r n r j!n! ϕτρτϕ τ r+ Γ r+ + O n r! π n = jr ρτϕ τ r Γ r+ + O n n r. Using the duplication formula for the Gamma-function: Γ r + r Γ + = r! π r, we obtain the following expansion of the r-th factorial moment of I n,j, which holds uniformly for all j n: EI r n,j = Γ r + r n j r + O n ρτϕ τ r n r. 3 5.3. Limiting distributions by applying the method of moments. The asymptotic behaviour of the moments of I n,j depending on the growth of j = jn can be obtained easily from the uniform expansion 3. An application of the method of moments shows then the limiting distribution results stated in Theorem 3.6. 5

5.3.. Region n j n. For this region it holds n j r n r n jr = + O n, which implies the following expansion for the factorial moments: n r EI r n,j = r Γ r + n j r + O n ρτϕ τ r n r. 33 Together with equation 6 connecting the factorial and the ordinary moments we obtain the following asymptotic expansion for the r-th moments of I n,j : EIn,j r = r Γ r + n j r + O n. ρτϕ τ r n r n j Thus we obtain, for each r fixed and n : n r E n j I r r r n,j ρτϕ Γ τ +, n i.e., the moments of n j I n,j converge to the moments of a Rayleigh distributed random variable with parameter σ =. An application of the theorem of Fréchet and Shohat shows then ρτϕ τ the corresponding limiting distribution result of Theorem 3.6. 5.3.. Region n j α n, α R +. Also for this region the asymptotic expansion 33 of the r-th factorial moments computed above holds and one gets further: EI r n,j r Γ r + ρτϕ τ r To continue we require the following lemma. α r α = ρτϕ τ r r Γ r +. 34 Lemma 5.. Let Y γ, with γ > 0, be a discrete random variable with distribution P{Y γ = k} = γk k! 0 x k+ e x γx dx, for k 0. Then it holds that the r-th factorial moments of Y γ are given as follows: EYγ r = γ r r r Γ +. Moreover, the distribution of Y γ is uniquely defined by its sequence of moments. Proof. For r 0 we get the case r = 0 shows that the probabilities sum up to, i.e., they define indeed a distribution: EYγ r = k r γk k! k 0 = e x 0 0 x k+ e x γx dx = γx γ r x r+ e γx dx = γ r 0 0 e x γx γ r x r+ γx k r k r! dx k r x r+ e x dx = r γ r 0 u r e u du = γ r r r Γ +. To show that the sequence of moments uniquely characterizes the distribution we consider the moment generating function F s := Ee syγ of Y γ. It can be shown easily that F s is given by the following expression: F s = k 0 P{Y γ = k}e ks = 0 xe x βx dx, with β = γ e s. 6

Thus the moment generating function F s exists in a real neighbourhood of s = 0 actually it exists for all real s, which implies that the corresponding distribution is uniquely defined by its moments. Since the r-th factorial and thus also ordinary moments of I n,j converge to the corresponding α moments of Y γ, with γ =, an application of the theorem of Fréchet and Shohat shows ρτϕ τ also for this case the limiting distribution results stated in Theorem 3.6. 5.3.3. Region n j n. From 3 one easily gets that EIn,j r 0, for r, which, by an d application of the theorem of Fréchet and Shohat shows I n,j 0 as stated in the corresponding part of Theorem 3.6. 5.4. Explicit formulas for probabilities. For some particular tree families it is possible to obtain explicit formulas for the probabilities P{I n,j = k} by extracting coefficients from the trivariate generating function Nz, u, q as given in 9. E.g., for ordered and unordered trees ϕt = t and ϕt = e t, respectively the generating function Nz, u, q is given by the following expressions: Nz, u, q = Nz, u, q = T z + u T z + u z uq, with T z = z T z ordered trees, e T z+u z + uqe T z+u, with T z = z et z unordered trees. We omit here the necessary computations for extracting coefficients of Nz, u, q in order to obtain the required probabilities via P{I n,j = k} = j!n j! n! [z j u n j q k ]Nz, u, q [z n, ]T z but we stated the corresponding results in Subsection 3.4 as formulas and 3. References [] J. A. Fill, P. Flajolet, and N. Kapur. Singularity analysis, Hadamard products, and tree recurrences. Journal of Computational and Applied Mathematics, 74:7 33, 005. [] P. Flajolet and G. Louchard. Analytic variations on the Airy distribution. Algorithmica, 3:36 377, 00. [3] P. Flajolet and A. M. Odlyzko. Singularity analysis of generating functions. SIAM Journal on Discrete Mathematics, 3:6 40, 990. [4] P. Flajolet, P. V. Poblete, and A. Viola. On the analysis of linear probing hashing. Algorithmica, :490 55, 998. [5] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Cambridge University Press, Cambridge, 009. [6] I. M. Gessel, Y.-N. Yeh, and B. E. Sagan. Enumeration of trees by inversions. Journal of Graph Theory, 9:435 459, 995. [7] J. P. S. Kung and C. Yan. Exact formulas for moments of sums of classical parking functions. Advances in Applied Mathematics, 3:45 44, 003. [8] M. Loève. Probability Theory I. Springer, New York, 4th edition, 977. [9] G. Louchard. Kac s formula, Lévy s local time and Brownian excursion. Journal of Applied Probability, :479 499, 984. [0] G. Louchard and H. Prodinger. The number of inversions in permutations: a saddle point approach. Journal of Integer Sequences, 6:Article 03..8, 003. [] C. Mallows and J. Riordan. The inverson enumerator for labeled trees. Bulletin of the American Mathematical Society, 74:9 94, 968. [] A. Meir and J. W. Moon. On the altitude of nodes in random trees. Canadian Journal of Mathematics, 30:997 05, 978. [3] A. Panholzer. The distribution of the size of the ancestor-tree and of the induced spanning subtree for random trees. Random Structures and Algorithms, 5:79 07, 004. [4] U. Schwerdtfeger, C. Richard, and B. Thatte. Area limit laws for symmetry classes of staircase polygons. Combinatorics, Probability and Computing, 9:44 46, 00. 7

Alois Panholzer, Institut für Diskrete Mathematik und Geometrie, Technische Universität Wien, Wiedner Hauptstr. 8-0/04, A-040 Wien, Austria E-mail address: Alois.Panholzer@tuwien.ac.at Georg Seitz, Institut für Diskrete Mathematik und Geometrie, Technische Universität Wien, Wiedner Hauptstr. 8-0/04, A-040 Wien, Austria E-mail address: Georg.Seitz@tuwien.ac.at 8