Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

Size: px

Start display at page:

Download "Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds."

Willa Warren
6 years ago
Views:

1 Proceedings of the 205 Winter Siulation Conference L. Yilaz, W. K. V. Chan, I. Moon, T. M. K. oeder, C. Macal, and M. D. ossetti, eds. EFFICIENT SIMULATION FO BANCHING LINEA ECUSIONS Ningyuan Chen Mariana Olvera-Cravioto Industrial Engineering and Operations esearch Colubia University New York, NY 0027, USA ABSTACT We provide an algorith for siulating the unique attracting fixed-point of linear branching distributional equations. Such equations appear in the analysis of inforation ranking algoriths, e.g., Pageank, and in the coplexity analysis of divide and conquer algoriths, e.g., Quicksort. The naive siulation approach would be to siulate exactly a suitable nuber of generations of a weighted branching process, which has exponential coplexity in the nuber of generations being sapled. Instead, we propose an iterative bootstrap algorith that has linear coplexity; we prove its convergence and the consistency of a faily of estiators based on our approach. INTODUCTION The coplexity analysis of divide and conquer algoriths such as Quicksort (ösler 99, Fill and Janson 200, ösler and üschendorf 200) and the ore recent analysis of inforation ranking algoriths on coplex graphs (e.g., Google s Pageank) (Volkovich and Litvak 200, Jelenković and Olvera-Cravioto 200, Chen, Litvak, and Olvera-Cravioto 204) otivate the analysis of the stochastic fixed-point equation D = Q + N C r r, () where (Q,N,C,C 2,...) is a real-valued rando vector with N N, and { i } i N is a sequence of i.i.d. copies of, independent of (Q,N,C,C 2,...). More precisely, the nuber of coparisons required in Quicksort for sorting an array of length n, properly noralized, satisfies in the liit as the array s length grows to infinity a distributional equation of the for in (). In the context of ranking algoriths, it has been shown that the rank of a randoly chosen node in a large directed graph with n nodes converges in distribution, as the size of the graph grows, to, where N represents the in-degree of the chosen node and the {C i } i are functions of the out-degree and node attributes of its neighbors. In the coplexity analysis of algoriths, knowing the distribution of akes it possible to estiate the oents and tail probabilities of the nuber of operations required to sort a list of nubers, which is iportant for bencharking and worst case analysis. In the case of inforation ranking algoriths, the distribution of can be used to deterine what type of nodes are typically ranked highly, which in turn can be used to design new ranking algoriths capable of identifying pre-specified data attributes. As further otivation for the study of branching fixed-point equations, we ention the closely related axiu equation N = D Q C r r, (2)

2 with (Q,N,C,C 2,...) nonnegative, which has been shown to appear in the analysis of the waiting tie distribution in large queueing networks with parallel servers and synchronization requireents (Karpelevich, Kelbert, and Suhov 994, Olvera-Cravioto and uiz-lacedelli 204). In this setting, W = log represents the waiting tie in stationarity of a job, that upon arrival to the network, is split into a nuber of subtasks requiring siultaneous service fro a rando subset of servers. Coputing the distribution and the oents of W is hence iportant for evaluating the perforance of such systes (e.g., ipleentations of Mapeduce and siilar algoriths in today s cloud coputing). Due to length liitations, we focus in this paper only on (), but we ention that the algorith we provide can easily be adapted to approxiately siulate the solutions to (2) (see eark 2). Although the study of () and (2), and other ax-plus branching recursions, has received considerable attention in the recent years (ösler 99, Biggins 998, Fill and Janson 200, ösler and üschendorf 200, Aldous and Bandyopadhyay 2005, Alseyer, Biggins, and Meiners 202, Alseyer and Meiners 202, Alseyer and Meiners 203, Jelenković and Olvera-Cravioto 202b, Jelenković and Olvera-Cravioto 202a, Jelenković and Olvera-Cravioto 205), the current literature only provides results on the characterization of the solutions to () and (2), their tail asyptotics, and in soe instances, their integer oents, which is not always enough for the applications entioned above. It is therefore of practical iportance to have a nuerical approach to estiate both the distribution and the general oents of. As a atheatical observation, we ention that both () and (2) are known to have ultiple solutions (see e.g. Biggins (998), Alseyer, Biggins, and Meiners (202), Alseyer and Meiners (202), Alseyer and Meiners (203) and the references therein for the characterization of the solutions). However, in applications we are often interested in the so-called endogenous solution. This endogenous solution is the unique liit under iterations of the distributional recursion (k+) N D = C r r (k) + Q, (3) where (Q,N,C,C 2,...) is a real-valued rando vector with N N, and { (k) i } i N is a sequence of i.i.d. copies of (k), independent of (Q,N,C,C 2,...), provided one starts with an initial distribution for (0) with sufficient finite oents (see, e.g., Lea 4.5 in Jelenković and Olvera-Cravioto (202a)). Moreover, asyptotics for the tail distribution of the endogenous solution are available under several different sets of assuptions for (Q,N,C,C 2,...) (Jelenković and Olvera-Cravioto 200, Jelenković and Olvera-Cravioto 202b, Jelenković and Olvera-Cravioto 202a, Olvera-Cravioto 202). As will be discussed later, the endogenous solution to () can be explicitly constructed on a weighted branching process. Thus, drawing soe siilarities with the analysis of branching processes, and the Galton-Watson process in particular, one could think of using the Laplace transfor of to obtain its distribution. Unfortunately, the presence of the weights {C i } in the Laplace transfor [ ϕ(s) = E [exp( s) = E exp( sq) N ϕ(sc i ) akes its inversion probleatic, aking a siulation approach even ore necessary. The first observation we ake regarding the siulation of, is that when P(Q = 0) < it is enough to be able to approxiate (k) for fixed values of k, since both (k) and can be constructed in the sae probability space in such a way that the difference (k) is geoetrically sall. More precisely, under very general conditions (see Proposition in Section 2), there exist positive constants K < and c < such that [ E (k) β Kc k+. (4) Our goal is then to siulate (k) for a suitably large value of k.

3 Π /0 = Π = C Π 2 = C 2 Π 3 = C 3 Π (3,) = C (3,) C 3 Π (3,3) = C (3,3) C 3 Π (,2) = C (,2)C Π (,) = C (,) C Π (2,) = C (2,) C 2 Π(3,2) = C(3,2)C3 Figure : Weighted branching process The siulation of (k) is not that straightforward either, since the naive approach of siulating i.i.d. copies of (Q,N,C,C 2,...) to construct a single realization of a weighted branching process, up to say k generations, is of order (E[N) k. Considering that in the exaples entioned earlier we typically have E[N > (N 2 for Quicksort, E[N 30 in any inforation ranking applications, and E[N in the hundreds for Mapeduce ipleentations), this approach is prohibitive. Instead, we propose in this paper an iterative bootstrap algorith that outputs a saple pool of observations { ˆ (k,) i } whose epirical distribution converges, in the Kantorovich-ubinstein distance, to that of (k) as the size of the pool. This ode of convergence is equivalent to weak convergence and convergence of the first absolute oents (see, e.g., Villani (2009)). Moreover, the coplexity of our proposed algorith is linear in k. This algorith is known in the statistical physics literature as population dynaics (see, e.g., Mezard and Montanari (2009)), where it has been used heuristically for the approxiation of belief propagation algoriths. The paper is organized as follows. Section 2 describes the weighted branching process and the linear recursion. The algorith itself is given in Section 3, which includes a reark on how to adapt it to the axiu case. Section 4 introduces the Kantorovich-ubinstein distance and proves the convergence properties of our proposed algorith. Nuerical exaples to illustrate the precision of the algorith are presented in Section 5. 2 LINEA ECUSIONS ON WEIGHTED BANCHING POCESSES As entioned in the introduction, the endogenous solution to () can be explicitly constructed on a weighted branching process. To describe the structure of a weighted branching process, let N + = {,2,3,...} be the set of positive integers and let U = k=0 (N + ) k be the set of all finite sequences i = (i,i 2,...,i n ), n 0, where by convention N 0 + = {/0} contains the null sequence /0. To ease the exposition, we will use (i, j) = (i,...,i n, j) to denote the index concatenation operation. Next, let (Q,N,C,C 2,...) be a real-valued vector with N N. We will refer to this vector as the generic branching vector. Now let {(Q i,n i,c (i,),c (i,2),...)} i U be a sequence of i.i.d. copies of the generic branching vector. To construct a weighted branching process we start by defining a tree as follows: let A 0 = {/0} denote the root of the tree, and define the nth generation according to the recursion A n = {(i,i n ) U : i A n, i n N i }, n. Now, assign to each node i in the tree a weight Π i according to the recursion Π /0, Π (i,in ) = C (i,in )Π i, n, see Figure. If P(N < ) = and C i for all i, the weighted branching process reduces to a Galton-Watson process.

4 For a weighted branching process with generic branching vector (Q,N,C,C 2,...), define the process { (k) : k 0} as follows: (k) = k j=0 i A j Q i Π i, k 0. (5) By focusing on the branching vector belonging to the root node, i.e., (Q /0,N /0,C,C 2,...) we can see that the process { (k) } satisfies the distributional equations (0) D = Q /0 = Q ( ) (k) N /0 k D = Q /0 + C r Q (r,i) Π (r,i) /C r = Q + j= (r,i) A j N C r (k ) r, k, (6) where (k ) r are i.i.d. copies of (k ), all independent of (Q,N,C,C 2,...). Here and throughout the paper the convention is that XY /Y if Y = 0. Moreover, if we define = j=0 i A j Q i Π i, (7) we have the following result. We use x y to denote the axiu of x and y. Proposition Let β be such that E[ Q β < and E [ ( N C i ) β <. In addition, assue either (i) (ρ ρ β ) <, or (ii) β = 2, ρ =, ρ β < and E[Q = 0. Then, there exist constants K β > 0 and 0 < c β < such that for (k) and defined according to (5) and (7), respectively, we have [ [ supe (k) β K β < and E (k) β K β c k+ β. k 0 Proof. For the case ρ ρ β <, Lea 4.4 in Jelenković and Olvera-Cravioto (202a) gives that for W n = i An Q i Π i and soe finite constant H β we have [ E W n β H β (ρ ρ β ) n. Let c β = ρ ρ β. Minkowski s inequality then gives Siilarly, (k) β (k) β n=k+ k n=0 W n β W n β n=k+ n=0 ( ) /β H β c n β = H β c /β β ( ( ) /β H β c n (k+)/β β = c β /β H β (ρ ρ β ) /β ( K β ) /β <. ) /β ( = K β c k+ β For the case β = 2, ρ =, ρ β < and E[Q = 0 we have that E [ ( ) Wn 2 N/0 2 [ N/0 = E C r W n,r = E Cr 2 (W n,r ) 2 + C r C s W n,r W n,s, r s N /0 ) /β.

5 where W n,r = (r,i) An Q (r,i) Π (r,i) /C r, and the {W n,r } r are i.i.d. (N /0,C,C 2,...). Since E[W n = 0 for all n 0, it follows that copies of W n, independent of E[W 2 n = ρ 2 E[W 2 n = ρ n 2 E[W 2 0 = Var(Q)ρ n 2. The two results now follow fro the sae arguents used above with H 2 = Var(Q) and c 2 = ρ 2. It follows fro the previous result that under the conditions of Proposition, (k) converges to both alost surely and in L β -nor. Siilarly, if we ignore the Q in the generic branching vector, assue that C i 0 for all i, and define the process W (k) = Π i = i A k N /0 ( ) D C r Π (r,i) /C r = (r,i) A k N C r W (k ) r, where the {W r (k ) } r are i.i.d. copies of W (k ) independent of (N,C,C 2,...), then it can be shown that {W (k) /ρ k : k 0} defines a nonnegative artingale which converges alost surely to the endogenous solution of the stochastic fixed-point equation W D = N C i ρ W i, where the {W i } i are i.i.d. copies of W, independent of (N,C,C 2,...). We refer to this equation as the hoogeneous case. As entioned in the introduction, our objective is to generate a saple of (k) for values of k sufficiently large to suitably approxiate. Our proposed algorith can also be used to siulate W (k), but due to space liitations we will oit the details. 3 THE ALGOITHM Note that based on (5), one way to siulate (k) would be to siulate a weighted branching process starting fro the root and up to the k generation and then add all the weights Q i Π i for i k j=0 A j. Alternatively, we could generate a large enough pool of i.i.d. copies of Q which would represent the Q i for i A k, and use the to generate a pool of i.i.d. observations of () by setting () N i i = C (i,r) (0) r + Q i, where {(Q i,n i,c (i,),c (i,2),...)} i are i.i.d. copies of the generic branching vector, independent of everything else, and the (0) r are the Q s generated in the previous step. We can continue this process until we get to the root node. On average, we would need (E[N) k i.i.d. copies of Q for the first pool of observations, (E[N) k copies of the generic branching vector for the second pool, and in general, (E[N) k j for the jth step. This approach is equivalent to siulating the weighted branching process starting fro the kth generation and going up to the root, and is the result of iterating (3). Our proposed algorith is based on this leaves to root approach, but to avoid the need for a geoetric nuber of leaves, we will resaple fro the initial pool to obtain a pool of the sae size of observations of (). In general, for the jth generation we will saple fro the pool obtained in the previous step of (approxiate) observations of ( j ) to obtain conditionally independent (approxiate) copies of ( j). In other words, to obtain a pool of approxiate copies of ( j) we bootstrap fro the pool previously obtained of approxiate copies of ( j ). The approxiation lies in the fact that we are not sapling fro

6 ( j ) itself, but fro a finite saple of conditionally independent observations that are only approxiately distributed as ( j ). The algorith is described below. Let (Q,N,C,C 2,...) denote the generic branching vector defining the weighted branching process. Let k be the depth of the recursion that we want to siulate, i.e., the algorith will produce a saple of rando variables approxiately distributed as ( (k). Choose N + to) be the bootstrap saple size. For each 0 j k, the algorith outputs P ( j,) ˆ ( j,), ˆ ( j,) 2,..., ˆ ( j,), which we refer to as the saple pool at level j.. Initialize: Set j = 0. Siulate ( a sequence {Q i } of ) i.i.d. copies of Q and let ˆ (0,) i = Q i for i =,...,. Output P (0,) = ˆ (0,), ˆ (0,) 2,..., ˆ (0,) and update j =. 2. While j k: (a) Siulate a sequence {(Q i,n i,c (i,),c (i,2),...)} of i.i.d. copies of the generic branching vector, independent of everything else. (b) Let ˆ ( j,) i = Q i + N i C (i,r) ˆ ( j,) (i,r), i =,...,, (8) where the ˆ ( j,) (i,r) are sapled uniforly with replaceent fro the pool P ( j,). ( ) (c) Output P ( j,) = ˆ ( j,), ˆ ( j,) 2,..., ˆ ( j,) and update j = j +. eark 2 To siulate an approxiation for the endogenous solution to the axiu equation (2), given by = j=0 i A j Q i Π i, siply replace (8) with ˆ ( j,) i N i = Q i C (i,r) ˆ ( j,) (i,r), i =,...,. Bootstrapping refers broadly to any ethod that relies on rando sapling with replaceent (Efron and Tibshirani 993). For exaple, bootstrapping can be used to estiate the variance of an estiator, by constructing saples of the estiator fro a nuber of resaples of the original dataset with replaceent. With the sae idea, our algorith draws saples uniforly with replaceent fro the previous bootstrap saple pool. Therefore, the ˆ ( j,) (i,r) on the right-hand side of (8) are only conditionally independent given P ( j,). Hence, the saples in P ( j,) are identically distributed but not independent for j. As we entioned earlier, the distribution of the { ˆ ( j,) i } in P ( j,) are only approxiately distributed as ( j), with the exception of the { ˆ (0,) i } which are exact. The first thing that we need to prove is that the distribution of the observations in P ( j,) does indeed converge to that of ( j). Intuitively, this should be the case since the epirical distribution of the { ˆ (0,) i } is the epirical distribution of i.i.d. observations of (0), and therefore should be close to the true distribution of (0) for suitably large. Siilarly, since the { ˆ (,) i } are constructed by sapling fro the epirical distribution of P (0,), which is close to the true distribution of (0), then their epirical distribution should be close to the epirical distribution of (), which in turn should be close to the true distribution of (). Inductively, provided the approxiation is good in step j, we can expect the epirical distribution of P ( j,) to be close to the true distribution of ( j). In the following section we ake the ode of the convergence precise by considering the Kantorovich-ubinstein distance between the epirical distribution of P ( j,) and the true distribution of ( j).

7 The second technical aspect of our proposed algorith is the lack of independence aong the observations in P (k,), since a natural estiator for quantities of the for E[h( (k) ) would be to use h( ˆ (k,) i ). (9) Hence, we also provide a result establishing the consistency of estiators of the for in (9) for a suitable faily of functions h. We conclude this section by pointing out that the coplexity of the algorith described above is of order k, while the naive Monte Carlo approach described earlier, which consists on sapling i.i.d. copies of a weighted branching process up to the kth generation, has order (E[N) k. This is a huge gain in efficiency. 4 CONVEGENCE AND CONSISTENCY In order to show that our proposed algorith does indeed produce observations that are approxiately distributed as (k) for any fixed k, we will show that the epirical distribution function of the observations in P (k,), i.e., ˆF k, (x) = ( ˆ (k,) i x) converges as to the true distribution function of (k), which we will denote by F k. We will show this by using the Kantorovich-ubinstein distance, which is a etric on the space of probability easures. In particular, convergence in this sense is equivalent to weak convergence plus convergence of the first absolute oents. Definition let M(µ,ν) denote the set of joint probability easures on with arginals µ and ν. then, the Kantorovich-ubinstein distance between µ and ν is given by d (µ,ν) = inf x y dπ(x,y). π M(µ,ν) We point out that d is only strictly speaking a distance when both µ and ν have finite first absolute oents. Moreover, it is well known that d (µ,ν) = F (u) G (u) du = F(x) G(x) dx. (0) 0 where F and G are the cuulative distribution functions of µ and ν, respectively, and f (t) = inf{x : f (x) t} denotes the pseudo-inverse of f. It follows that the optial coupling of two real rando variables X and Y is given by (X,Y ) = (F (U),G (U)), where U is uniforly distributed in [0,. eark 3 The Kantorovich-ubinstein distance is also known as the Wasserstein etric of order. In general, both the Kantorovich-ubinstein distance and the ore general Wasserstein etric of order p can be defined in any etric space; we restrict our definition in this paper to the real line since that is all we need. We refer the interested reader to (Villani 2009) for ore details. With soe abuse of notation, for two distribution functions F and G we use d (F,G) to denote the Kantorovich-ubinstein distance between their corresponding probability easures. The following proposition shows that for i.i.d. saples, the expected value of the Kantorovich-ubinstein distance between the epirical distribution function and the true distribution converges to zero. Proposition 4 Let {X i } i be a sequence of i.i.d. rando variables with coon distribution F. Let F n denote the epirical distribution function of a saple of size n. Then, provided there exists α (, 2)

8 such that E [ X α <, we have that Chen and Olvera-Cravioto E [d (F n,f) n +/α ( 2α α α ) E[ X α. Proposition 4 can be proved following the sae arguents used in the proof of Theore 2.2 in del Barrio, Giné, and Matrán (999) by setting M =, and thus we oit it. We now give the ain theore of the paper, which establishes the convergence of the expected Kantorovich-ubinstein distance between ˆF k, and F k. Its proof is based on induction and the explicit representation (0). ecall that ρ β = E [ N C i β. Theore 5 Suppose that the conditions of Proposition are satisfied for soe β >. Then, for any α (,2) with α β, there exists a constant K α < such that Proof. E [ d ( ˆF k,,f k ) K α +/α k i=0 By Proposition there exists a constant H α such that [ H α = supe (k) α ( [ ) sup E (k) β α/β <. k 0 k 0 ρ i. () Set K α = H α ( 2α α α ). We will give a proof by induction. For j = 0, we have that ˆF 0, (x) = (Q i x), where {Q i } i is a sequence of i.i.d. copies of Q. It follows that ˆF 0, is the epirical distribution function of (0), and by Proposition 4 we have that E [ d ( ˆF 0,,F 0 ) K α +/α. Now suppose that () holds for j. Let {U i r} i,r be a sequence of i.i.d. Unifor(0,) rando variables, independent of everything else. Let {(Q i,n i,c (i,),c (i,2),...)} i be a sequence of i.i.d. copies of the generic branching vector, also independent of everything else. ecall that F j is the distribution function of ( j ) and define the rando variables ˆ ( j,) N i i = C (i,r) ˆF j, (U i r) + Q i and ( j) for each i =,2,...,. Now use these rando variables to define ˆF j, (x) = N i i = ( ˆ ( j,) i x) and F j, (x) = C (i,r) F j (U i r) + Q i ( ( j) i Note that F j, is an epirical distribution function of i.i.d. copies of ( j), which has been carefully coupled with the function ˆF j, produced by the algorith. By the triangle inequality and Proposition 4 we have that x). E [ d ( ˆF j,,f j ) E [ d ( ˆF j,,f j, ) + E [d (F j,,f j ) E [ d ( ˆF j,,f j, ) + K α +/α. To analyze the reaining expectation note that E [ d ( ˆF j,,f j, ) [ = E ˆF j, (x) F j, (x) dx [ = [ E ˆ ( j,) i ( j) i = E E [ N C r E [ d ( ˆF j,,f j ), [ E ( ˆ ( j,) i x) ( ( j) i x) dx N i C (i,r) ( ˆF j, (U r) i Fj (U r)) i

9 where in the last step we used the fact that (N i,c (i,),c (i,2),...) is independent of { Ur i } r and of ˆF j,, cobined with the explicit representation of the Kantorovich-ubinstein distance given in (0). The induction hypothesis now gives E [ d ( ˆF j,,f j ) ρ E [ d ( ˆF j,,f j ) + K α +/α K α +/α j ρ ρ i + K α +/α i=0 This copletes the proof. = K α +/α j ρ. i i=0 Note that the proof of Theore 5 iplies that ˆ ( j,) i ( j) i = N i C (i,r)fj (U r)+q i D i = ( j) in L -nor for all fixed j N, and hence in distribution. In other words, ( ) P ˆ (k,) i x F k (x) as, (2) for all i =,2,...,, and for any continuity point of F k. This also iplies that E [ ˆF k, (x) ( ) = P ˆ (k,) x F k (x) as, (3) for all continuity points of F k. Since our algorith produces a pool P (k,) of rando variables approxiately distributed according to F k, it akes sense to use it for estiating expectations related to (k). In particular, we are interested in estiators of the for in (9). The proble with this kind of estiators is that the rando variables in P (k,) are only conditionally independent given ˆF k,. P P Definition 2 We say that Θ n is a consistent estiator for θ if Θ n θ as n, where denotes convergence in probability. Our second theore shows the consistency of estiators of the for in (9) for a broad class of functions. Theore 6 Suppose that the conditions of Proposition are satisfied for soe β >. Suppose h : is continuous and h(x) C( + x ) for all x and soe constant C > 0. Then, the estiator h( ˆ (k,) i ) = h(x)d ˆF k, (x), ( ) where P (k,) = ˆ (k,), ˆ (k,) 2,..., ˆ (k,), is a consistent estiator for E[h( (k) ). Proof. For any M > 0, define h M (x) as h M (x) = h( M)(x M) + h(x)( M < x M) + h(m)(x > M), and note that h M is uniforly continuous. We then have h(x)d ˆF k, (x) h(x)df k (x) 2C ( + x )df k (x) + 2C ( + x )d ˆF k, (x) x >M x >M + h M (x)d ˆF k, (x) h M (x)df k (x). (4)

10 Fix ε > 0 and choose M ε > 0 such that E [ ( (k) + )( (k) > M ε ) ε/(4c) and such that M ε and M ε are continuity points of F k. Define ( ˆ (k,), (k) ) = ( ˆF k, (U),F k (U)), where U is a unifor [0, rando variable independent of P (k,). Next, note that g(x) = + x is Lipschitz continuous with Lipschitz constant one and therefore ( + x )d ˆF k, (x) = ( + M ε ) ( ˆF k, ( M ε ) + ˆF k, (M ε ) ) + ˆF k, (x)dx + ( ˆF k, (x))dx x >M ε x< M ε x>m ε ( + M ε ) ( ˆF k, ( M ε ) + ˆF k, (M ε ) ) + d ( ˆF k,,f k ) + F k (x)dx + ( F k (x))dx x< M ε x>m ε = ( + M ε ) ( ˆF k, ( M ε ) F k ( M ε ) + F k (M ε ) ˆF k, (M ε ) ) + d ( ˆF k,,f k ) [ + E ( (k) + )( (k) > M ε ). Finally, since h Mε is bounded and uniforly continuous, then ω(δ) = sup{ h Mε (x) h Mε (y) : x y δ} converges to zero as δ 0. Hence, for any γ > 0, [ h Mε (x)d ˆF k, (x) h Mε (x)df k (x) E hmε ( ˆ (k,) ) h Mε ( (k) ) ˆF k, [ ω( γ ) + K ε E ( ˆ (k,) (k) > γ) ˆF k, ω( γ ) + K ε γ d ( ˆF k,,f k ), where 2K ε = sup{ h Mε (x) : x }. Choose 0 < γ < /α for the α (,2) in Theore 5 and cobine the previous estiates to obtain [ E h(x)d ˆF k, (dx) h(x)df k (dx) 2C( + M ε ) ( E[ ˆF k, ( M ε ) F k ( M ε ) + F k (M ε ) E[ ˆF k, (M ε ) ) + ε + ω( γ ) + (2C + K ε γ )E [ d ( ˆF k,,f k ). Since E[ ˆF k, ( M ε ) F k ( M ε ) and E[ ˆF k, (M ε ) F k (M ε ) by (3), and γ E [ d ( ˆF k,,f k ) 0 by Theore 5, it follows that [ li supe h(x)d ˆF k, (dx) h(x)df k (dx) ε. Since ε > 0 was arbitrary, the convergence in L, and therefore in probability, follows. 5 NUMEICAL EXAMPLES This last section of the paper gives a nuerical exaple to illustrate the perforance of our algorith. Consider a generic branching vector (Q,N,C,C 2,...) where the {C i } i are i.i.d. and independent of N and Q, with N also independent of Q. Figure 2a plots the epirical cuulative distribution function of 000 saples of (0, i.e., F 0,000 in our notation, versus the functions ˆF 0,200 and ˆF 0,000 produced by our algorith, for the case where the C i are uniforly distributed in [0,0.2, Q uniforly distributed in [0, and N is a Poisson rando variable with ean 3. Note that we cannot copare our results with the true distribution F 0 since it is not available in closed for. Coputing F 0,000 required seconds using Python with an Intel i7-4700mq 2.40 GHz processor and 8 GB of eory, while coputing ˆF 0,000 required only 2. seconds. We point out

11 F 0,0000 ˆF 0,0000 G F 0,000 ˆF 0,000 ˆF 0, x (a) The functions F 0,000 (x), ˆF 0,200 (x) and ˆF 0,000 (x) x (b) The functions F 0,0000 (x), ˆF 0,0000 (x) and G 0 (x), where G 0 is evaluated only at integer values of x and linearly interpolated in between. Figure 2: Nuerical exaples. that in applications to inforation ranking algoriths E[N can be in the thirties range, which would ake the difference in coputation tie even ore ipressive. Our second exaple plots the tail distribution of the epirical cuulative distribution function of (0) for 0,000 saples versus the tail of ˆF 0,0000 for an exaple where N is a zeta rando varialbe with a probability ass function P(N = k) k 2.5, Q is an exponential rando variable with ean, and the C i have a unifor distribution in [0,0.5. In this case the exact asyptotics for P( (k) > x) as x are given by P( (k) > x) (E[C E[Q) α ( ρ ) α k ρα( j ρ k j ) α P(N > x), j=0 where P(N > x) = x α L(x) is regularly varying (see Lea 5. in Jelenković and Olvera-Cravioto (200)), which reduces for the specific distributions we have chosen to G 0 (x) (0.25)2.5 ( (0.49)) j=0 (0.07) j ( (0.49) 0 j ) 2.5 P(N > x) = (0.365)P(N > x). Figure 2b plots the copleentary distributions of F 0,0000, ˆF 0,0000 and copares the to G. We can see that the tails of both F 0,0000 and ˆF 0,0000 approach the asyptotic roughly at the sae tie. EFEENCES Aldous, D., and A. Bandyopadhyay A Survey of Max-Type ecursive Distributional Equation. Annals of Applied Probability 5 (2): Alseyer, G., J. Biggins, and M. Meiners The Functional Equation of the Soothing Transfor. Ann. Probab. 40 (5): Alseyer, G., and M. Meiners Fixed Points of Inhoogeneous Soothing Transfors. J. Differ. Equ. Appl. 8 (8):

12 Alseyer, G., and M. Meiners Fixed points of the soothing transfor: Two-sided solutions. Probab. Theory el. 55 (-2): Biggins, J Lindley-type equations in the branching rando walk. Stochastic Process. Appl. 75: Chen, N., N. Litvak, and M. Olvera-Cravioto anking algoriths on directed configuration networks. ArXiv: : 39. del Barrio, E., E. Giné, and C. Matrán Central liit theores for the Wasserstein distance between the epirical and the true distributions. Annals of Probability: Efron, B., and. J. Tibshirani An introductin to the bootstrap. Fill, J., and S. Janson Approxiating the liiting Quicksort distribution. ando Structures Algoriths 9 (3-4): Jelenković, P., and M. Olvera-Cravioto Inforation ranking and power laws on trees. Adv. Appl. Prob. 42 (4): Jelenković, P., and M. Olvera-Cravioto. 202a. Iplicit enewal Theore for Trees with General Weights. Stochastic Process. Appl. 22 (9): Jelenković, P., and M. Olvera-Cravioto. 202b. Iplicit enewal Theory and Power Tails on Trees. Adv. Appl. Prob. 44 (2): Jelenković, P., and M. Olvera-Cravioto Maxius on Trees. Stochastic Process. Appl. 25: Karpelevich, F. I., M. Y. Kelbert, and Y. M. Suhov Higher-order Lindley equations. Stochastic Processes and their Applications 53 (): Mezard, M., and A. Montanari Inforation, physics, and coputation. Oxford University Press. Olvera-Cravioto, M Tail behavior of solutions of linear recursions on trees. Stochastic Process. Appl. 22 (4): Olvera-Cravioto, M., and O. uiz-lacedelli Parallel queues with synchronization. arxiv: ösler, U. 99. A liit theore for Quicksort. AIO Theor. Infor. Appl. 25: ösler, U., and L. üschendorf The contraction ethod for recursive algoriths. Algorithica 29 (-2): Villani, C Optial transport, old and new. New York: Springer. Volkovich, Y., and N. Litvak Asyptotic Analysis for Personalized Web Search. Adv. Appl. Prob. 42 (2): AUTHO BIOGAPHIES NINGYUAN CHEN is a Ph.D. student in the IEO Departent at Colubia University. He obtained his B.S. in Matheatics fro Peking University and a M.S. in Operations esearch fro Colubia University. His research focuses on two areas: () Applied Probability, in particular, rando graphs and inforation ranking algoriths, and (2) strategic behavior of custoers under arket icrostructure, with interfaces of dynaic pricing, stochastic odeling and optiization. His eail address is nc2462@colubia.edu. MAIANA OLVEA-CAVIOTO is an Associate Professor in the IEO Departent at Colubia University. She obtained her Ph.D. in Manageent Science and Engineering fro Stanford University and holds an M.S. in Statistics fro the sae university. Her research interests are in Applied Probability, in particular, the analysis of inforation ranking algoriths, queueing theory, rando graphs, weighted branching processes and heavy-tailed asyptotics in general. She serves in the editorial board of Stochastic Models and QUESTA. Her e-ail address is olvera@ieor.colubia.edu.

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher