Multi-commodity allocation for dynamic demands using PageRank vectors

Size: px

Start display at page:

Download "Multi-commodity allocation for dynamic demands using PageRank vectors"

Tyrone Howard
5 years ago
Views:

1 Multi-commodity allocation for dynamic demands using PageRank vectors Fan Chung, Paul Horn, and Jacob Hughes University of California, San Diego Harvard University Abstract. We consider a variant of the contact process concerning multi-commodity allocation on networks. In this process, the demands for several types of commodities are initially given at some specified vertices and then the demands spread interactively on a contact graph. To allocate supplies in such a dynamic setting, we use a modified version of PageRank vectors, called Kronecker PageRank, to identify vertices for shipping supplies. We analyze both the situation that the demand distribution evolves mostly in clusters around the initial vertices and the case that the demands spread to the whole network. We establish sharp upper bounds for the probability that the demands are satisfied as a function of PageRank vectors. Introduction Efficient allocation of resources to meet changing demands is a task arising in numerous applications. For example, institutions such as governments or corporations respond to the needs of a populace, and wish to meet the demands within allowed expenditure of resources. In some cases where demand spreads, one has to be able to act before demand becomes unmanageable. In the case of an epidemic, for instance, one desires to find a way to distribute medicine so that the disease will be contained. Such problems have been studied in several contexts using the contact process model [8], [6], [], [0], [4]. In [4], it was demonstrated how to use PageRank vectors to both restrict the number of nodes inoculated and to provide certain containment guarantees. In this paper, we study a variant of the classical contact process, a continuous time Markov process on a contact graph. In our scenario, vertices in the graph each have varying levels of demand for multiple commodities. Demand at a vertex propagates to its neighbors at a rate depending on the current demand. Our model allows for interactions between different commodities; demand for one commodity may influence demand for another. This fits many scenarios that arise, for instance demand for iphones may accelerate the demand for ipads. As another example, demand at a node can might be viewed as a measure of discontent with the current supply of a resource. It is natural for an unhappy node to create unrest in its neighbors. As the modified contact process continues, demands at a vertex are increased at a rate based on the demands at neighboring vertices. There are also decreased at a satisfaction rate, which can be thought of of as a frequency of shipments. Demand spreads at rates which are a linear combination of demands from neighboring vertices. These rates are encapsulated in a spread matrix, B, roughly analogous to the infectivity parameter in the classical contact process. The goal of this paper is to find satisfaction rates, dependent on the spread matrix B and the geometry of the contact graph which ensure that eventually all vertices have no demand and the process dies out. Our process will be defined, in detail, in Section.

2 To satisfy the demands which evolve according to our model as defined in Section, the goal is to ship commodities and supply vertices with unsatisfied demands in an efficient way. The model here differs somewhat from typical resource allocation problems in the sense that we do not specify the location of the warehouses for the supply. We will not be concerned with either the sources of the supply or the detailed incremental costs of shipping supply. Instead, our goal is to identify how often to ship each commodity to a particular vertex, in order to contain and satisfy demands, given an initial seed set. The reader is referred to [3] for the usual resource allocation problem. Contact graphs of interest take many forms: Cities and countries exert trade pressure on neighboring cities and countries. Communication on the internet can also spread demand for products, or discontent leading to a revolution. Instead of studying this problem on particular models for these contact graphs, we study the problem on arbitrary finite graphs. Two schemes of making shipments are considered. First is a global solution which involves scheduling shipments to all vertices in the graph, and ensures that all demand is satisfied in O(log n) time, with high probability and regardless of the initial demand. This is made precise in Theorem once the model is formally defined. The next scheme is a local solution, in the sense that shipments are scheduled to only a subset of vertices which contain the initial demand. In particular, when the contact graph has some clustering structure we are interested in subsets so that the demand within the subset is satisfied quickly (in O(log n) time) and demands reach a vertex not receiving shipments with low probability. Precise results to this end are given in Section 5. This latter scheme relies on understanding the geometry of the particular contact graph being studied. Our scheme uses PageRank to identify important vertices and to bound the probability that demand in our process leaves a set. We also introduce a variant of PageRank, which we call Kronecker PageRank and is introduced in Section 3, which provides sharper bounds by better utilizing the structure of the spread matrix B as well as the geometry of the graph in its estimates. Our analysis provides a tradeoff in the following sense: we may use PageRank estimates to identify a set of vertices containing the initial demand which are important to ship to or we may use our PageRank estimates to give a guarantee on the escape probability of leaving a particular set of our choosing. Precise results to this end are given in Theorems 3 and 4, using standard PageRank and Kronecker PageRank respectively. Preliminaries and The Demand Model We model demand spreading within an undirected simple graph, G = (V, E). We write this v w when v and w are adjacent. For each vertex v V, let d v be the degree of v, which is the number of neighbors of v. While not strictly necessary, we assume that there is a self loop at each vertex. In this case, d v includes v in the count of neighbors and hence the loop counts as towards the degree. We let n = V, the number of nodes of G. An exponential random variable with parameter λ has probability density function given by f(x) = λe λx for x 0, and 0 for x < 0. This distribution will be denoted Exp(λ). One important property of exponential random variables is the memoryless property: if X is an exponential random variable then for any constants a, b > 0, P(X > a + b X > a) = P(X > b).

3 If X and Y are independent and X Exp(λ ), Y Exp(λ ) then min{x, Y } Exp(λ + λ ). A Poisson point process at rate λ is a sequence of random variables {X i } i= so that X and X i X i, for i, has distribution Exp(λ). Before we describe our model, let us briefly recall the contact process on a graph G, which we denote CP (T, β, σ, G). In the contact process (see for example [] or [4]), a disease initially infects a set T V (G). The disease has an infectivity parameter, β, and each vertex has a certain amount of medicine σ v. An infected vertex v infects its neighbor u at times given by a Poisson point process {X uv } at rate β, and each infected vertex is cured at times given by a Poisson point process at rate σ v. In the most frequently studied case, σ is constant and the host graph is an infinite graph. The process ends when all vertices are cured, and the basic problem is to determine under which conditions on σ, and β the process ends almost surely. In the case of finite graphs, if σ v > 0 for every vertex, it is easy to observe that the process ends a.s., so the problem becomes determining how fast the process ends. The k commodity dynamic demand model on a graph G is a variant of the contact process, DD(τ (0), B, σ, G, N). In this situation, B is a real valued k k-matrix (not assumed to be symmetric, or even non-negative), which we call the spread matrix. The supply function is σ : V R k, and τ (0) : V N k is the initial demand. The state of the process at time t is given by τ (t) : V R k, which gives the demands for each vertex at time t. We use τ v (t) N k to denote the demand at vertex v at time t, and τ j v (t) N to denote the demand for commodity j at time t. A node v is said to be satisfied at time t if τ v (t) = 0, and unsatisfied otherwise. N N serves as a uniform bound for the maximum demand for any resource at any point (for instance, it could be the population of a city, if a vertex is a city.) The existence of N is simply to ensure integrability of some random variables. In the case of the contact process, N =. The spread matrix B = [β ij ] describes how the demand for one commodity influences demands for other commodities. The i, j entry of B, β ij, determines the spread rate of the demand for commodity j that is caused by demand for commodity i. In particular, we can describe the rate of spread events as follows. If v is a node that is unsatisfied at time t, and w an adjacent vertex, then there are spread events from v to w with rates max{τ v (t)b, 0}. That is the rate at which τ j w increases due to the demand at v is given by max{ i τ i v(t)β ij, 0}. Here, when we say an event occurs with rate λ, we mean that the elapsed time until that event takes place is distributed as Exp(λ). Because the minimum of exponential random variables is itself an exponential random variable, we can capture the total spreading rates in a condensed form. We define the rate function at time t, ρ(t) : V R k, by ρ v = w v τ w (t)b = (τ (t)(a B)) v, where τ (t) is viewed as a vector with indices indexed by V k. ρ i v(t) is the rate at which τ i v is increasing at time t. Any spread events that would raise τ i v above N are ignored. Supply events occur with rates given by τ (t)diag(σ), independently of any neighboring supply events. That is, the time until τ i v is decreased by is distributed as Exp(σ i vτ i v). We briefly give a construction of the process, to show it is well-defined. Let E denote the set of ordered edges; that is, ordered pairs that are edges in the graph, so that uv and vu are

4 distinct. We run independent Poisson point processes {Xe j,ρ } e E (G),j [k],ρ [N] k so that Xe j,ρ is at rate max{0, [τ v B] j } and independent Poisson point processes {Xv i,n } v V (G),i [k],n [N] so that Xv i,n is at rate nσv. i Then these finitely many point processes can easily be seen to define the entire process; a spread event of type j from a vertex v to a vertex u which is currently in state ρ is controlled by the point process Xvu j,ρ with satisfaction events handled similarly. Such a formulation is that it gives an easy coupling between processes that shows that if B B pointwise, the stochastic process DD(τ (0), B, σ, G, N) stochastically dominates DD(τ (0), B, σ, G, N). That is, in the coupling the demands in the B process are always at least those in the B process. This is accomplished by noting that the rates ρb ρb pointwise for all ρ N k. We thus take point processes Ye j,ρ at rate [ρb ρb ] i. If the point processes {Xe j,ρ } and {Xv i,n } are used to determine DD(τ (0), B, σ, G, N), then the point processes {Xe j,ρ Ye j,ρ } and {Xv i,n } are used to determine DD(τ (0), B, σ, G, N). In particular, this allows us to replace B with B, where B ij = max{b ij, 0}, and conclusions about the extinction of the B process still hold for B. Furthermore, this turns out not to be entirely unreasonable. One hopes that the negative entries in B would afford better bounds on the extinction time, but in many cases with negative entries in B extinctions of some demand types mean that the process is eventually run in a non-negative case. In light of this, we will assume for the rest of this paper that B is non-negative for convenience. Given an initial demand τ (0) and spread matrix B, our goal is to find a supply function σ such that demand is satisfied. Ideally we would like to do this with small supply rates. Furthermore, the supply rates should only depend on the contact graph G, the spread matrix B, and the initial demand τ (0), but not on t or τ (t). Because we take B to be an arbitrary positive k k matrix, we will at times need to use various matrix norms in order to understand the process. For a square matrix B, there are many different matrix norms that can be used (see [7]). We will use the following notation for the following norms:. B = i,j a ij is the l norm.. B = max j i a ij is the maximum column sum norm. 3. B = max i j a ij is the maximum row sum norm. 4. B = max{ λ λ is an eigenvalue of A A} is the spectral norm. 3 PageRank and Kronecker PageRank The notion of PageRank was first introduced by Brin and Page [] in 998 for Google s search algorithms. Although PageRank was originally used for the Web graph, it is well defined on any finite graph G. The basis of PageRank is random walks on graphs. A walk is a sequence of vertices (v 0, v,..., v k ) where v i v i+. A simple random walk of length k is a sequence of random variables (x 0,..., x k ) where the starting vertex x 0 is chosen according to some distribution, and P(x i+ = v x i ) = { d xi if x i v 0 if x i v.

5 Let D be the diagonal degree matrix with entries D vv = d v, and let A be the adjacency matrix with entries { if v w A vw = 0 if otherwise. Then the transition probability matrix for a random walk on G is given by W = D A. We will mainly use a modified version of the PageRank, called personalized PageRank. Personalized PageRank has two parameters, a jumping constant α [0, ] and a seed s which is some probability distribution on the vertex set V of G. The personalized PageRank vector pr(α, s) for jumping constant α and the seed distribution s on V is given by pr(α, s) = α ( α) l sw l. l=0 Note that here we view s as a row vector, which is our convention for all vectors throughout this paper. We note that the PageRank vector is also the solution to the recurrence relation pr(α, s) = αs + ( α)pr(α, s)w. The original definition of PageRank [] is the special case where s is the uniform distribution over all the vertices. For a subset of vertices H V, the volume of H is the sum of degrees of the vertices of H. The Cheeger Ratio of H, h(h), measures the cut between H and H via the relationship h(h) = e(h, H) min{vol(h), vol( H)}. The α core of a subset H is the set of vertices { C α = v H pr(α, v ) H h }. α Personalized PageRank naturally gives bounds on the probability that demands leave a given set in the k-commodity dynamic demand model. These bounds lose some of the structure of the process given by B, however. In order to better understand the process and get tighter bounds, we will use generalization of the personalized PageRank vector, the Kronecker PageRank vector. If B is an k k matrix, and A an n n matrix, then the Kronecker product A B is the nk nk block matrix a B a n B A B =..... a n B a nn B With this, we can define Kronecker PageRank.

6 Definition (Kronecker PageRank) Let B be a square k k matrix with spectral radius strictly less than, and W be the transition matrix for a random walk on a graph G. Let s be a non-negative vector in R k V. The Kronecker PageRank vector with parameters B and s is defined as Kpr(B, s) = s(w B) l = l=0 s(w l B l ) l=0 Requiring the spectral radius of B less than is necessary to ensure convergence of the infinite sum, as the spectrum of W B is the product of the spectra of W and B. Since the eigenvalues of W have absolute value at most, the sum will converge. We note that in the case where B is a matrix B = β < and s is a probability distribution, then we have the relationship Kpr(B, s) = s(w β) l = l=0 l=0 sβ l W l = pr( β, s), β so the Kronecker PageRank is a natural extension of personalized PageRank. We will see in Theorem 4 that the Kronecker PageRank will arise naturally in our analysis in Section 3, and give better bounds than those that will be afforded by standard PageRank by incorporating the spread matrix. 4 Global analysis: Supplying every vertex Here we show that if supply rates are above a certain threshold, then with probability approaching demands will be satisfied. Recall that B is a non-negative real valued k k matrix, and σ is the vector of supply rates for the process DD(τ (0), B, σ, G, N). Theorem Consider the k commodity demand model on a graph G with n vertices parameterized by spread matrix B = [β ij ]. Let X(t) = τ (t) ), the total amount of demand at time t. If the supply + δ for δ > 0 then with probability ɛ all vertices ( rates to each vertex v are σv i > d v j are satisfied at time t for all t > δ β ij +β ji ( log(nk) + log(x(0)) + log ( )). ɛ Proof. We begin by considering the quantity te[τ (t)]. From the discussion in Section, we know that demand is increasing with rates given by ρ(t) = τ (t) (A B), but also demand decreases proportionally to the supply rates and current demand. Indeed, let S = diag(σ), the diagonal nk nk matrix with entries given by the supply vector. Then demand decreases at each vertex according to rates given by the supply rate vector τ (t)s. It is not difficult to encapsulate all of the above information in the simple expression E[τ (t)] = E[ρ(t) τ (t)s] = E[τ (t)](a B S). () t

7 A detailed proof of () is left to the appendix for space reasons. Solving the matrix differential equation with initial condition E[τ (0)] = τ (0) yields E[τ (t)] = τ (0)e t(a B S). () Let Q = A B S. Then by [5], e tq e tν, where ν is the largest eigenvalue of Q+Q. We note that Q+Q = A ( B+B ) S, which has diagonal terms β ii σv, i ranging over all values of v and i. By the Gershgorin Circle Theorem, the eigenvalues of Q+Q Since σ i v > d v ( j (d v )β ii d v j i β ij + β ji are contained in the intervals σv, i d v β ij + β ji σ i v. j ) β ij +β ji + δ all the eigenvalues of Q+Q are less than δ. Therefore E[X(t)] = τ(0)e t(a B S) nk τ(0)e t(a B S) nk τ(0) e t(a B S) nk τ(0) e tν nkx(0)e tδ Thus Markov s inequality gives that P(X(t) > 0) < ɛ if t > δ ( log(nk) + log(x(0)) + log ( ɛ )). We note that this approach works for all initial distributions τ (0), and on any graph G. In particular, it is agnostic to the shape of the graph. This indicates that in many situations this approach may be overkill and that we could have used smaller supply rates. In the next section, we analyze the process more carefully and give conditions that depend on the initial distribution of demand and the geometry of the underlying contact graph. 5 Local analysis: Supplying a small subset For the remainder of the discussion, it is convenient to introduce reformulation of the model that takes advantage of the fact that demands take on integer values. Rather than view demands as a function τ : V N k, we view demands as discrete objects sitting on each node. Borrowing language from chip-firing games on graphs (see, for example, [9]) we view units of the demand as chips located on vertices of the graph. For example, if k = 7 then for a vertex v with τ v (t) = (0,,, 0,, 0, 3) then we would say that at time t there was -chip, 3-chips, 5-chips, and 3 7-chips at vertex v, corresponding to unit of demand for commodity, etc. Unlike in classical chip-firing games, the number of chips is not static, and the process is continuous time. We restate the possible transitions in terms of demand chips. For an i chip at vertex v, there are two types of transition events:

8 For each vertex w v and each j =,..., l, a j chip is added at w at rate β ij. When this occurs we say that the new j chip is created by the i chip. The i chip itself is removed with rate σ i v. Due to the properties of exponential random variables, the rates add linearly, and the model is equivalent to the original description discussed in Section. The main advantage of this reformulation is the ability to trace back the history of a chip. If there is a chip c at vertex v at time t, then either c existed at time t = 0, or there is a sequence of l chips (c 0,..., c l = c) located at vertices along a walk π = (v 0, v,..., v l = v) with the following properties:. c 0 existed at t = 0. c r is created by c r for r =,..., l. We allow π to have repeated vertices to allow for the case where demand created more demand at the same vertex. If a chip c exists at time 0, we refer to it as an initial chip. For a path π = (v 0, v,..., v l ) and a chip c 0 located at v 0, we define the event S π,c0 to be the event there is a sequence of m chips (c 0,..., c l ) located respectively at (v 0, v,..., v l ) and c r is created by c r for r =,..., l. It is important to note that S π,c0 occurring does not imply that there is any demand at v l at time t because it could be satisfied sometime before t. However, if there is a demand at v l at time t, then S π,c must have occurred for some initial chip c at vertex v 0 and some walk π from v 0 to v l. We begin by relating P(S π,c0 ) to the length of the walk π. Inspired by Theorem we make the assumption that supply rates are proportional to the degree of the vertices. That is, we assume that σv i > µ i (d v ) for all v for constants µ i > 0. Lemma Let M = diag(µ,..., µ k ), ˆB = M B and ζ = min{ ˆB, ˆB } Then for any chip c 0 located at v 0 and any walk π = (v 0,..., v l ) of length l, P(S π,c0 ) kζ l d j=0 vj Proof. Let S r denote the event that a chip c r at v r creates a chip at v r+. If c r is an i chip, then for it to create any chip at v r+ a spread event must occur before c r is removed. The time until c r creates a j chip at v r+ is an exponential random variable with rate β ij. Since the time until c r is removed is given by Exp(σ i v), the probability of c r creating a j chip is β ij β ij +σ i v β ij σ i v < β ij µ i d vr. Thus P(S r ) < β ij i,j µ i d vr = d vr ˆB. For a walk π of length l, we want to consider the intermediate steps more carefully. Since there are l transitions that occur, we can use the same reasoning as above to obtain the bound P(S π,c ) < r=0 d vr ˆB l = r=0 d vr ˆB l k r=0 d vr ˆB l k r=0 d vr ˆB l.

9 The factor of k that appears in the final lines above is just a consequence of switching from the vector -norm ˆB l to maximum column sum norm ˆB l (see [7]). We could have just as easily switched to the maximum row sum norm and obtained the term k ˆB l, and so it follows that P(S π,c ) < min{k r=0 d vr ˆB l, k r=0 d vr ˆB l } = kζ l We note that the use of ζ = min{ ˆB, ˆB } in Lemma reflects the difficulty in working with arbitrary spread matrices B. For certain classes of spread matrices (e.g. if B is symmetric or diagonalizable) it is possible to obtain tighter bounds. Lemma will be allow us to obtain a bound using PageRank, but note that our use of matrix norms ignores some of the structure of the spread matrix B. Following the proof of Theorem 3 below, a more careful analysis fully using the structure of the spread matrix B will lead naturally to use of Kronecker PageRank, which we explore in Theorem 4. Theorem 3 Suppose that initial demand is contained in S H V with and each vertex v H has supply rates σv i > µ i d v, and σw i = 0 for w H. Let M = diag(µ,..., µ k ), ˆB = M B and ζ = min{ ˆB, ˆB }. Let x(t) be defined by x v (t) = i τ v(t), i and X(t) = τ (t). Let E H denote the event that demands spread outside the set H. Then. P(E H ) X(0) ζ pr ( ζ, τ(0) X(0) ) H. If S in the ( ζ) core of H, then P(E H ) X(0)h(H) ζ( ζ), where h(h) is the Cheeger ratio of H. Proof. Let P l denote the set of all paths of length l from an initial chip in S to H such that the first k steps are in H. Let P = l= P l. The key observation is that if w H ever has demand, then S π,c must have occurred for some initial chip c and path π from the location of c to w. Thus we can use the union bound to get that P(S π,c ) P(S π,c ) π P l (π,c) P l P(S π,c ) l v 0 S c at v 0 v l H π=(v 0,...,v l ) P l d l v 0 S r=0 vr ζ l c at v 0 v l H π=(v 0,...,v l ) P l j=0 d vj. = l = l x(0)ζ l (D A) l H x(0)ζ l W l H = X(0) pr ( x ) ζ, ζ X(0) H.

10 proving the first statement. The second statement follows the same proof as Theorem 3. of [4]. Finally we show how Kronecker PageRank arises in a natural way as the bound of the escape probability of this process. Theorem 4 Suppose that the initial demand is contained in S H V with and each vertex v H has supply rates σv i µ i d v. Let M = diag(µ,..., µ k ), ˆB = M B and ζ = ˆB. Let X(t) = τ (t), the total amount of demands at time t. Let E H denote the event that demands spread outside the set H. Then E H can be bounded above using the Kronecker PageRank vector via the relationship: ( P(E H ) X(0)Kpr ˆB, τ(0) ) H X(0) Proof. Let f be a vector indicator function of commodity type on chips, that is f(c) = e i if c is an i chip, where e i denotes the ith standard basis vector for R k. Let C 0 denote the set of initial chips. By the same methods that were used in the proof of Lemma, we can bound the probability that demand originating from c ever spreads along a path π = (v 0, v,..., v l ) by the sum P(S π,c ) f(c) ˆB l d r=0 vr Therefore using the same technique as in the proof of Theorem 3 we obtain the bound P(S π,c ) P(S π,u ) P(S π,u ) π P l u S π B l l c C 0 v l H π=(v 0,...,v l ) P l f(c) ˆB l = τ(0)(d A d ˆB) l H l π=(v 0,...,v l ) P l r=0 vr l = l c C 0 v l H τ(0)(w ˆB) l H = X(0)Kpr( τ(0) ˆB, ) H X(0) On the event of non-escape, we would like to guarantee that all demand is satisfied quickly. To make this precise, let S t denote the event that all of the vertices are satisfied at time t. In order to complete the analysis of the local case, we would like to bound P(S t E H ), where E H is as in Theorems 3 and 4. Such a bound is not immediately given by Theorem. To derive a bound on P(S t E H ), consider running a modified Dirichlet process which is identical to the standard process with the same supply rates, except demand leaving H is ignored. Let S t denote the event that in Dirichlet process, all of the events are satisfied at time t then P(S t) can be bounded directly by Theorem as this Dirichlet process restricted to vertices in H is the standard process on H. Furthermore P(S t E H ) P(S t). Therefore P(S t E H ) = P(S t E H ) P(E H ) P(S t) P(E H ).

11 Combining this observation along with Theorems 3 and 4, yields that the probability of escape from H is bounded and if the process does not escape from H it dies quickly. Theorems 3 and 4 can be used in two different ways. As stated, they provide a way to bound the probability demands escape from a given subset. However, they can be also used to construct such a bounding subset. For example, given initial demand τ (0) contained in an initial set of vertices S V, we can algorithmically construct H such that demand stays in H with probability ɛ by iteratively selecting vertices with the highest (Kroneceker) PageRank. 6 An Example An immediate question is whether anything is actually gained by the introduction of Kronecker PageRank. Suppose we have a spread matrix B, and some initial demand on a graph G. We wish to identify a subset of vertices H G to make shipments to so that the escape probability is at most ɛ. We may use either Theorem 3 or Theorem 4 to identify such a set. The bound afforded by Theorem 4 is clearly sharper than the bound in Theorem 3 as the structure of the spread matrix B is taken into account, but it is not guaranteed that the identified set is actually smaller. In many cases it actually is, though depending on B it may not be significantly smaller. To illustruate, we give a simple example calculation on synthetic data. Our graph G is an instance of the following random process, which is designed to create a graph which contains tighter clusters that are slightly more sparsely connected to neighboring clusters: We begin with a cycle on 0 vertices. Each vertex is then replaced by an instance of the random graph G 0,.3, that is a graph 0 vertices and each edge existing independently with probability.3. Inter-cluster edges are then created between vertices in neighboring clusters with probability.05. We consider the case where k = 4, and B = The initial demand is given by τ (0) = {,,, 0, 0,, 0, 0,..., 0}. In addition we assume µ i =, and ζ =.85. We demonstrate the difference between Theorems 3 and 4 in the following way. The figure below shows the graph G. The demands start in the large outlined vertices and spread outward from there. Theorem 4 states that with 95% probability, demands stay in the circular vertices. Theorem 3 states that with 95% probability, demands stay in the diamond and circular vertices. This small example illustrates how the Kronecker PageRank can be used to obtain improved results.

12 References. C. Borgs, J. Chayes, A. Ganesh, and A. Saberi, How to distribute antidote to control epidemics, Random Structures & Algorithms, 37 (00), pp S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems, 30 (998), pp Y. Chevaleyre, P. Dunne, U. Endriss, J. Lang, M. Lemaitre, N. Maudet, J. Padget, S. Phelps, J. Rodriguez-Aguilar, and P. Sousa, Issues in multiagent resource allocation, Informatica, (006). 4. F. Chung, P. Horn, and A. Tsiatas, Distributing antidote using PageRank vectors, Internet Mathematics, 6 (009), pp G. Dahlquist, Stability and error bounds in the numerical integration of ordinary differential equations, Kungl. Tekn. Hgsk. Handl. Stockholm. No., 30 (959), p A. Ganesh, L. Massoulie, and D. Towsley, The effect of network topology on the spread of epidemics, vol., IEEE, 005, pp R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, I. Z. Kiss, D. M. Green, and R. R. Kao, Infectious disease control using contact tracing in random and scale-free networks, Journal of the Royal Society, Interface / the Royal Society, 3 (006), pp PMID: C. Merino, The chip-firing game, Discrete Mathematics, 30 (005), pp M. Newman, Spread of epidemic disease on networks, Physical Review E, 66 (00). 7 Appendix In this section we establish the differential equation used in the proof of Theorem, namely: Lemma 5 If τ (t) denotes the demand vector at time t, A is the adjacency matrix of G, B denotes the spread matrix and S = diag(σ) denotes the supply matrix, then E[τ (t)] = E[τ (t)](a B S). t

13 The key of Lemma 5 are the following two well known and simple facts concerning exponentially distributed random variables. We use the notation f(h) = O h 0 (g(h)) to indicate that f(h) Cġ(h) for h sufficiently small. Lemma 6 Suppose X is an exponentially distributed waiting time with rate λ, then An immediate corollary is P(X < h) = λh + O h 0 (h ). Lemma 7 Suppose X, Y are independent exponentially distributed waiting times with rates λ, λ. Then P(X, Y < h) = O h 0 (h ). Proof of Lemma 5. Fix a vertex v and commodity i. We will show that t E[ τ i v(t)] = [E[τ (t)](a B S) ] i v, Since this holds for all v, and i the result will follow. To do this, we compute the derivative by the definition, that is we compute lim h 0 E[τ i v(t) τ i v(t + h)]. h To do this, consider the conditional expectation, E[τ i v(t) τ i v(t + h) τ (t)]. Note that by Lemma 7, then probability that two independent events (either two spread events, or two satisfy events or a spread and a satisfy event) occur is O h 0 (h ). On the other hand, given a neighbor u of v, and a commodity j, the probability of a spread event originating from this neighbor and commodity in time (t, t + h) is exactly B ji τ j u(t)h + O h 0 (h ). Likewise, the probability of a satisfaction event in this time is τ i v(t)σ i vh + O h 0 (h ). Linearity of expectation yields E[τ i v(t) τ i v(t + h) τ (t)] = E[τ (t)(a B S)h + O h 0 (h ) τ (t)] = τ (t)(a B S)h + O h 0 (h ) τ (t). In particular, note that the O h 0 (h ) term means that there is a (large) constant C = C(τ(t), A, B), so that O h 0 (h ) C h for h. Note that due to our conditioning this constant depends on τ(t), but critically not on h. Note that this constant is σ(τ(t))-measurable. By the tower property of conditional expectation, E[τ i lim v(t) τ i v(t + h)] E[E[τ i = lim v(t) τ i v(t + h) τ (0)]]. h 0 h h 0 h E[τ (t)(a B S)h] + E[E[O h 0 (h ) τ (t)]] = lim h 0 h = τ (t)(a B S) + lim E[O h 0 (h)]. h 0

14 It suffices to show that lim E[O h 0 (h)] lim E[ O h 0 (h) ] = 0. h 0 h 0 But recall that the O h 0 (h) term is bounded by C(τ (t), A, B) h for h. Thus it is enough to show that lim E[ C(τ (t), A, B)h ] = 0. h 0 This follows from the monotone convergence theorem, so long as lim E[ C(τ (t), A, B) ] <. h 0 To complete the proof we note that we can give an upper bound on C(τ (t), A, B) in terms of τ (t), n and max{b ij }. Indeed, the rates of the active point processes are at most τ (t) max{b i,j }; and thus the probability that any pair of point processes both have events in the period (t, t + h) is bounded by C(τ (t), A, B)h C τ (t) max{b i,j } n h. But τ (t) << n N k, where n = G, k is the number of demands and N is our uniform upper bound for the demand at a point (indeed, this is the precisely the motivation for such a bound.)

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Four graph partitioning algorithms Fan Chung University of California, San Diego History of graph partitioning NP-hard approximation algorithms Spectral method, Fiedler 73, Folklore Multicommunity flow,