MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Polynomial Time Perfect Sampler for Discretized Dirichlet Distribution

Size: px

Start display at page:

Download "MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Polynomial Time Perfect Sampler for Discretized Dirichlet Distribution"

Roger Hunter
5 years ago
Views:

1 MATHEMATICAL ENGINEERING TECHNICAL REPORTS Polynomial Time Perfect Sampler for Discretized Dirichlet Distriution Tomomi MATSUI and Shuji KIJIMA METR April 003 DEPARTMENT OF MATHEMATICAL INFORMATICS GRADUATE SCHOOL OF INFORMATION SCIENCE AND TECHNOLOGY THE UNIVERSITY OF TOKYO BUNKYO-KU, TOKYO , JAPAN WWW page:

2 The METR technical reports are pulished as a means to ensure timely dissemination of scholarly and technical work on a non-commercial asis. Copyright and all rights therein are maintained y the authors or y other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked y each author s copyright. These works may not e reposted without the explicit permission of the copyright holder.

3 Polynomial Time Perfect Sampler for Discretized Dirichlet Distriution Tomomi Matsui and Shuji Kijima Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo-ku, Tokyo , Japan. version: April 4th Astract. In this paper, we propose a perfect exact sampling algorithm according to a discretized Dirichlet distriution. Our algorithm is a monotone coupling from the past algorithm, which is a Las Vegas type randomized algorithm. We propose a new Markov chain whose limit distriution is a discretized Dirichlet distriution. Our algorithm simulates transitions of the chain On 3 ln times where n is the dimension the numer of parameters and / is the grid size for discretization. Thus the otained ound does not depend on the magnitudes of parameters. In each transition, we need to sample a random variale according to a discretized eta distriution -dimensional Dirichlet distriution. To show the polynomiality, we employ the path coupling method carefully and show that our chain is rapidly mixing. Introduction In this paper, we propose a polynomial time perfect exact sampling algorithm according to a discretized Dirichlet distriution. Our algorithm is a monotone coupling from the past algorithm, which is a Las Vegas type randomized algorithm. We propose a new Markov chain for generating random samples according to a discretized Dirichlet distriution. In each transition of our Markov chain, we need a random sample according to a discretized eta distriution -dimensional Dirichlet distriution. Our algorithm simulates transitions of the Markov chain On 3 ln times in expectation where n is the dimension the numer of parameters and / is the grid size for discretization. To show the polynomiality, we employ the path coupling method and show that the mixing rate of our chain is ounded y nn + ln n n/. Statistical methods are widely studied in ioinformatics since they are powerful tools to discover genes causing a common disease from a numer of oserved data. These methods often use EM algorithm, Markov chain Monte Carlo Supported y Superroust Computation Project of the st Century COE Program Information Science and Technology Strategic Core.

4 method, Gis sampler, and so on. The Dirichlet distriution appears as prior and posterior distriution for the multinomial distriution in these methods since the Dirichlet distriution is the conjugate prior of parameters of the multinomial distriution []. For example, Niu, Qin, Xu, and Liu proposed a Bayesian haplotype inference method [7], which decides phased paternal and maternal individual genotypes proailistically. Another example is a population structure inferring algorithm y Pritchard, Stephens, and Donnely [8]. Their algorithm is ased on MCMC method. In these examples, the Dirichlet distriution appears with various dimensions and various parameters. Thus we need an efficient algorithm for sampling from the Dirichlet distriution with aritrary dimensions and parameters. One approach of sampling from a continuous Dirichlet distriution is y rejection see [4] for example. In this way, the numer of required samples from the gamma distriution is equal to the size of the dimension of the Dirichlet distriution. Though we can sample from the gamma distriution y using rejection sampling, the ratio of rejection ecomes higher as the parameter is smaller. Thus, it does not seems effective way for small parameters. Another approach is a descretization of the domain and adoption of Metropolis-Hastings algorithm. Recently, Matsui, Motoki and Kamatani proposed a Markov chain for sampling from discretized Dirichlet distriution [6]. The mixing rate of their chain is ounded y /nn + ln n and so their algorithm is a polynomial time approximate sampler. Propp and Wilson devised a surprising algorithm, called monotone CFTP algoritm or ackward coupling, which produces exact samples from the limit distriution [9, 0]. Monotone CFTP algorithm simulates infinite time transitions of a monotone Markov chain in a proailistically finite time. To employ their result, we propose a new monotone Markov chain whose stationary distriution is a discretized Dirichlet distriution. We also show that the mixing rate of our chain is ounded y nn + ln n n/. This paper is deeply related to authors recent paper [5] which proposes a polynomial time perfect sampling algorithm for two-rowed contingency tales. The idea of the algorithm and outline of the proof descried in this paper is similar to those in our paper [5]. The main difference of these two papers is that when we deal with the uniform distriution on two-rowed contingency tales, the transition proailities of the Markov chain are predetermined. However, in case of discretized Dirichlet distriution, the transition proailities vary with respect to the magnitude of paprameters. Thus, to show the monotonicity and polynomiality of the algorithm proposed in this paper, we need a detailed discussions which appear in Appendix section. Showing two lemmas in Appendix section is easier in case of the uniform distriution on two-rowed contingency tales.

5 Review of Coupling From the Past Algorithm When we simulate an ergodic Markov chain for infinite time, we can gain a sample exactly according to the stationary distriution. Suppose that there exists a chain from infinite past, then a possile state at the present time of the chain for which we can have an evidence of the uniqueness without respect to an initial state of the chain, is a realization of a random sample exactly from the stationary distriution. This is the key idea of CFTP. Suppose that we have an ergodic Markov chain MC with finite state space Ω and transition matrix P. The transition rule of the Markov chain X X can e descried y a deterministic function φ : Ω [0, Ω, called update function, as follows. Given a random numer Λ uniformly distriuted over [0,, update function φ satisfies that Prφx, Λ = y = P x, y for any x, y Ω. We can realize the Markov chain y setting X = φx, Λ. Clearly, update function corresponding to the given transition matrix P is not unique. The result of transitions of the chain from the time t to t t < t with a sequence of random numers λ = λ[t ], λ[t + ],..., λ[t ] [0, t t is denoted y Φ t t x, λ : Ω [0, t t Ω where Φ t t x, λ def. = φφ φx, λ[t ],..., λ[t ], λ[t ]. We say that a sequence λ [0, T satisfies the coalescence condition, when y Ω, x Ω, y = Φ 0 T x, λ. With these preparation, standard Coupling From The Past algorithm is expressed as follows. CFTP Algorithm [9] Step. Set the starting time period T := to go ack, and set λ e the empty sequence. Step. Generate random real numers λ[t ], λ[t + ],..., λ[ T/ ] [0,, and insert them to the head of λ in order, i.e., put λ := λ[t ], λ[t + ],..., λ[ ]. Step 3. Start a chain from each element x Ω at time period T, and run each chain to time period 0 according to the update function φ with the sequence of numers in λ. Note that every chain uses the common sequence λ. Step 4. [ Coalescence check ] If y Ω, x Ω, y = Φ 0 T x, λ, then return y and stop. Else, update the starting time period T := T, and go to Step. Theorem. CFTP Theorem [9] Let MC e an ergodic finite Markov chain with state space Ω, defined y an update function φ : Ω [0, Ω. If the CFTP algorithm terminates with proaility, then the otained value is a realization of a random variale exactly distriuted according to the stationary distriution. Theorem gives a proailistically finite time algorithm for infinite time simulation. However, simulations from all states executed in Step 3 is a hard requirement. 3

6 Suppose that there exists a partial order on the set of states Ω. A transition rule expressed y a deterministic update function φ is called monotone with respect to if λ [0,, x, y Ω, x y φx, λ φy, λ. For ease, we also say that a chain is monotone if the chain has a monotone transition rule. Theorem. monotone CFTP [9, 3] Suppose that a Markov chain defined y an update function φ is monotone with respect to a partially ordered set of states Ω,, and x max, x min Ω, x Ω, x max x x min. Then the CFTP algorithm terminates with proaility, and a sequence λ [0, T satisfies the coalescence condition, i.e., y Ω, x Ω, y = Φ 0 T x, λ, if and only if Φ 0 T x max, λ = Φ 0 T x min, λ. When the given Markov chain satisfies the conditions of Theorem, we can modify CFTP algorithm y sustituting Step 4 a y Step 4. a If y Ω, y = Φ 0 T x max, λ = Φ 0 T x min, λ, then return y and stop. The algorithm otained y the aove modification is called a monotone CFTP algorithm. 3 Perfect Sampling Algorithm In this paper, we denote the set of integers non-negative or positive integers y Z Z +, Z ++. Dirichlet random vector P = P, P,..., P n with non-negative parameters u,..., u n is a vector of random variales that admits the proaility density function Γ n i= u i n i= Γ u i n i= p ui i defined on the set {p, p,..., p n R n p + + p n =, p, p,..., p n > 0} where Γ u is the gamma function. Throughout this paper, we assume that n. For any integer n, we discretize the domain with grid size / and otain a discrete set of integer vectors Ω defined y Ω def. = {x, x,..., x n Z n ++ x i > 0 i, x + + x n = }. A discretized Dirichlet random vector with non-negative parameters u,..., u n is a random vector X = X,..., X n Ω with the distriution Pr[X = x,..., x n ] def. n = C x i / ui where C is the partition function normarizing constant defined y C n x Ω i= x i/ ui. i= def. = 4

7 For any integer, we introduce a set of -dimensional integer vectors Ω def. = {Y, Y Z Y, Y > 0, Y + Y = } and a distriution function f Y, Y u i, u j : Ω [0, ] with non-negative parameters u i, u j defined y f Y, Y u i, u j def. = Cu i, u j, Y ui uj Y def. where Cu i, u j, = Y Y ui,y Ω Y uj is the partition function. We also introduce a vector g 0 u i, u j, g u i, u j,..., g u i, u j defined y g k u i, u j def. = { 0 k = 0 k l= Cu i, u j, l ui l uj k {,,..., } It is clear that 0 = g 0 u i, u j < g u i, u j < < g u i, u j =. We descrie our Markov chain M with state space Ω. Given a current state X Ω, we generate a random numer λ [, n. Then transition X X with respect to λ takes place as follows. Markov chain M Input: A random numer λ [, n. Step : Put i := λ and := X i + X i+. Step : Let k {,,..., } e a unique value satisfying g k u i, u i+ λ λ < g k u i, u i+. k j = i, Step 3: Put X j := k j = i +, otherwise. X j The update function φ : Ω [, n Ω of our chain is defined y φx, λ def. = X where X is determined y the aove procedure. Clearly, this chain is irreducile and aperiodic. Since the detailed alance equations hold, the stationary distriution of the aove Markov chain M is the discretized Dirichlet distriution. We define two special states X U, X L Ω y X U def. = n +,,,...,, X L def. =,,...,, n +. Now we descrie our algorithm. Algorithm Step. Set the starting time period T := to go ack, and λ e the empty sequence. Step. Generate random real numers λ[t ], λ[t + ],..., λ[ T/ ] [, n and put λ = λ[t ], λ[t ],..., λ[ ]. Step 3. Start two chains from X U and X L, respectively at time period T, and run them to time period 0 according to the update function φ with the sequence of numers in λ. 5

8 Step 4. [Coalescence check] If Y Ω, Y = Φ 0 T X U, λ = Φ 0 T X L, λ, then return Y and stop. Else, update the starting time period T := T, and go to Step. Theorem 3. With proaility, Algorithm terminates and returns a state. The state otained y Algorithm is a realization of a sample exactly according to the discretized Dirichlet distriution. The aove theorem guarantees that Algorithm is a perfect sampling algorithm. We prove the aove theorem y showing the monotonicity in the next section. 4 Monotonicity of the Chain In Section, we descried two theorems. Thus we only need to show the monotonicity of our chain. First, we introduce a partial order on the state space Ω. For any vector X Ω, we define the cumulative sum vector c X Z n+ + y { c X i def. 0 i = 0, = X + X + + X i i {,,..., n}, where c X = c X 0, c X,..., c X n. Clearly, there exists a ijection etween Ω and the set {c X X Ω}. For any pair of states X, Y Ω, we say X Y if and only if c X c Y. It is clear that the relation is a partial order on Ω. We can see easily that X Ω, X U X X L. We say that a state X Ω covers Y Ω at j, denoted y X Y or X j Y, when { i = j, c X i c Y i = 0 otherwise. Note that X j Y if and only if + i = j, X i Y i = i = j +, 0 otherwise. Next, we show a key lemma for proving monotonicity. Lemma. X, Y Ω, λ [, n, X j Y φx, λ φy, λ. Proof: We denote φx, λ y X and φy, λ y Y for simplicity. For any index i λ, it is easy to see that c X i = c X i and c Y i = c Y i, and so c X i c Y i = c X i c Y i 0 since X Y. In the following, we show that c X λ c Y λ. Clearly from the definition of our Markov chain, X λ is a unique value k satisfying g k u λ, u λ + λ λ < g k u λ, u λ + 6

9 where def. = X λ + X λ +. Similarly, Y λ is a unique value k satisfying g k u λ, u λ + λ λ < g k u λ, u λ + def. where = Y λ + Y λ +. We need to consider following three cases. Case : If λ j and λ j +, then = and so we have X λ = k = k = Y λ. Case : Consider the case that λ = j. Since X j Y, we have = +. From the definition of cumulative sum vector, c X j c Y j = c X j + X j c Y j Y j = c X j + X j c Y j Y j = X j Y j. Thus, it is enough to show that X j Y j. Lemma 5 in Appendix section implies the following inequalities, 0 = g +0 u j, u j = g 0 u j, u j g + u j, u j g u j, u j g +k u j, u j g k u j, u j g +k u j, u j g + u j, u j g u j, u j = g + u j, u j =, which we will call alternating inequalities. For example, if inequalities g +k u j, u j λ λ < g k u j, u j g +k u j, u j hold, then X λ = k > k = Y λ. And if g +k u j, u j g k u j, u j λ λ < g +k u j, u j hold, then X λ = k = Y λ X j Y j {,. Thus we have,, 3,..., see Figure. From the aove we have that X j Y j. }, g + 0 g + g + g + g g 0 g g g Fig.. A figure of alternating inequalities. In the aove, we denote g k u j, u j and g +k u j, u j y g k, g +k, respectively. Case 3: Consider the case that λ = j +. Since X j Y, we have + =. From the definition of cumulative sum vector, c X j + c Y j + = c X j + X j+ c Y j Y j+ = c X j + X j+ c Y j Y j+ = + X j+ Y j+. 7

10 Thus, it is enough to show that + X j+ Y j+. Then Lemma 5 also implies the following alternating inequalities 0 = g +0 u j+, u j+ = g 0 u j+, u j+ g + u j+, u j+ g u j+, u j+ g +k u j+, u j+ g k u j+, u j+ g +k u j+, u j+ g + u j+, u j+ g u j+, u j+ = g + u j+, u j+ =. Then it is easy to see that { X j+ Y j+,,,,..., 3 }, see Figure. From the aove, we otain the inequality + X j+ Y j+. g 0 g g g 0 3 g + 0 g + g + g + g + Fig.. A figure of alternating inequalities. In the aove, we denote g k u j+, u j+ and g +k u j+, u j+ y g k, g +k, respectively. Lemma. The Markov chain M defined y the update function φ is monotone with respect to, i.e., λ [, n, X, Y Ω, X Y φx, λ φy, λ. Proof: It is easy to see that there exists a sequence of states Z, Z,..., Z r with appropreate length satisfying X = Z Z Z r = Y. Then applying Lemma repeatedly, we can show that φx, λ = φz, λ φz, λ φz r, λ = φy, λ. Lastly, we show the correctness of our algorithm. Proof of Theorem 3: From Lemma, the Markov chain is monotone, and it is clear that X U, X L is a unique pair of maximum and minimum states. Thus Algorithm is a monotone CFTP algorithm and Theorems and implies Theorem 3. 5 Expected Running Time Here, we discuss the running time of our algorithm. In this section, we assume the following. Condition Parameters are arranged in non-increasing order, i.e., u u u n. 8

11 The following is a main result of this paper. Theorem 4. Under Condition, the expected numer of tansitions executed in Algorithm is ounded y On 3 ln where n is the dimension numer of parameters and / is the grid size for discretization. In the rest of this section, we prove Theorem 4 y estimating the expectation def. of coalescence time T Z ++ defined y T = min{t > 0 y Ω, x Ω, y = Φ 0 tx, Λ}. Note that T is a random variale. Given a pair of proailistic distriutions ν and ν on the finite state space Ω, the total variation distance etween ν and ν is defined y D TV ν, ν def. = x Ω ν x ν x. The mixing rate of an ergodic Markov chain is defined y τ def. = max x Ω {min{t s t, D TV π, Px s /e}} where π is the stationary distriution and Px s is the proailistic distriution of the chain at time s with initial state x. Path Coupling Theorem is a useful technique for ounding the mixing rate. Theorem 5. Path Coupling [, ] Let MC e a finite ergodic Markov chain with state space Ω. Let G = Ω, E e a connected undirected graph with vertex Ω set Ω and edge set E. Let l : E R e a positive length defined on the edge set. For any pair of vertices {x, y} of G, the distance etween x and y, denoted y dx, y and/or dy, x, is the length of a shortest path etween x and y, where the length of a path is the sum of the lengths of edges in the path. Suppose that there exists a joint process X, Y X, Y with respect to MC satisfying that whose marginals are a faithful copy of MC and 0 < β <, {X, Y } E, E[dX, Y ] βdx, Y. Then the mixing rate τ of Markov chain MC satisfies τ β +lnd/d, where d def. = min x,y Ω dx, y and D def. = max x,y Ω dx, y. The aove theorem differs from the original theorem in [] since the integrality of the edge length is not assumed. We drop the integrality and introduced the minimum distance d. This modification is not essential and we can show Theorem 5 similarly. Now, we show the polynomiality of our algorithm. First, we estimate the mixing rate of our chain M y employing Path Coupling Theorem. In the proof of the following lemma, Condition plays an important role. Lemma 3. Under Condition, the mixing rate τ of our Markov chain M satisfies τ nn + ln n n/. Proof. Let G = Ω, E e an undirected simple graph with vertex set Ω and edge set E defined as follows. A pair of vertices {X, Y } is an edge if and only if / n i= X i Y i =. Clearly, the graph G is connected. For each edge 9

12 e = {X, Y } E, there exists a unique pair of indices j, j {,..., n}, called the supporting pair of e, satisfying { i = j, j X i Y i =, 0 otherwise. We define the length le of an edge e y le def. = /n j i= n i where j = max{j, j } and {j, j } is the supporting pair of e. Note that min e E le max e E le n/. For each pair X, Y Ω, we define the distance dx, Y e the length of the shortest path etween X and Y on G. Clearly, the diameter of G, i.e., max X,Y Ω dx, Y, is ounded y n n/, since dx, Y n/ n i= / X i Y i n/ n for any X, Y Ω. The definition of edge length implies that for any edge {X, Y } E, dx, Y = l{x, Y }. We define a joint process X, Y X, Y y X, Y φx, Λ, φy, Λ with uniform real random numer Λ [, n where φ is the update function defined in Section 3. Now we show that E[dX, Y ] βdx, Y, β = /nn, for any pair {X, Y } E. In the following, we denote the supporting pair of {X, Y } y {j, j }. Without loss of generality, we can assume that j < j, and X j + = Y j. Case : When Λ = j, we will show that E[dX, Y Λ = j ] dx, Y /n j + /n. In case j = j, X = Y with conditional proailty. Hence dx, Y = 0. In the following, we consider the case j < j. Put = X j + X j and = Y j + Y j. Since X j + = Y j, + = holds. From the definition of the update function of our Marokov chain, we have followings, X j = k [g k u j, u j Λ Λ < g k u j, u j ] Y j = k [g +k u j, u j Λ Λ < g +k u j, u j ]. As descried in the proof of Lemma, the alternating inequalities 0 = g +0 u j, u j = g 0 u j, u j g + u j, u j g u j, u j g + u j, u j g u j, u j = g + u j, u j =, hold. Thus we have X j Y j {,,,,..., 3, }. If X j = Y j, the supporting pair of {X, Y } is {j, j } and so dx, Y = dx, Y. If X j Y j, the supporting pair of {X, Y } is {j, j } and so dx, Y = dx, Y n j + /n. 0

13 Lemma 6 in Appendix section implies that if u j u j, then Pr[X j = Y j Λ = j ] /, Pr[X j Y j Λ = j ] /, y showing that the following inequality Pr[X j Y j Λ = j ] Pr[X j = Y j Λ = j ] = k= [g k u j, u j g +k u j, u j ] k= [g +k u j, u j g k u j, u j ] 0 hold when u j u j see Figure 3. g 0 g g g 0 3 g + 0 g + g + g + g + Fig. 3. A figure of alternating inequalities. In the aove, we denote g k u j, u j and g +k u j, u j y g k, g +k, respectively. Condition is necessary to show Lemma 6. Thus we otain that E[dX, Y Λ = j ] /dx, Y + /dx, Y n j + /n = dx, Y /n j + /n. Case : When Λ = j, we can show that E[dX, Y Λ = j ] dx, Y + /n j /n in a similar way with Case. Case 3: When Λ j and Λ j, it is easy to see that the supporting pair {j, j } of {X, Y } satisfies j = max{j, j }. Thus dx, Y = dx, Y. The proaility of appearance of Case is equal to /n, and that of Case is less than or equal to /n. From the aove, E[dX, Y ] dx, Y n j + + n j n n n n = dx, Y n dx, Y = dx, Y. n max {X,Y } E {dx, Y } nn Since the diameter of G is ounded y n n/, Theorem 5 implies that the mixing rate τ satisfies τ nn + ln n n/. Next, we estimate the coalescence time. When we apply Propp and Wilson s result in [9] straightforwardly, the otained coalescence time is not tight. In the following, we use their technique carefully and derive a etter ound.

14 Lemma 4. Under Condition, the coalescence time T of M satisfies E[T ] = On 3 ln. Proof. Let G = Ω, E e the undirected graph and dx, Y, X, Y Ω, e the metric on G, oth of which are defined in the proof of Lemma 3. We define D def. def. = dx U, X L and τ 0 = nn + ln D. By using the inequality otained in the proof of Lemma 3, we have Pr[T > τ 0 ] = Pr [ Φ 0 τ 0 X U, Λ Φ 0 τ 0 X L, Λ ] = Pr [Φ τ0 0 X U, Λ Φ τ0 0 X L, Λ] X,Y Ω dx, Y Pr [X = Φτ0 = E [d Φ τ0 0 X U, Λ, Φ τ0 0 X L, Λ] = 0 X U, Λ, Y = Φ τ0 0 X L, Λ] τ0 nn dx U, X L nn +ln D nn D e e ln D D e. By sumultiplicativity of coalescence time [9], for any k Z +, Pr[T > kτ 0 ] Pr[T > τ 0 ] k /e k. Thus E[T ] = t=0 tpr[t = t] τ 0 + τ 0 Pr[T > τ 0 ] + τ 0 Pr[T > τ 0 ] + τ 0 + τ 0 /e + τ 0 /e + = τ 0 / /e τ 0. Clearly, D n n/ ecause n. Then we otain the result that E[T ] = On 3 ln. Lastly, we ound the expected numer of transitions executed in Algorithm. Proof of Theorem 4: We denote T e the coalescence time of our chain. Clearly T is a random variale. Put K = log T. Algorithm terminates when we set the starting time period T = K at K + st iteration. Then the total numer of simulated transitions is ounded y K < K 8T, since we need to execute two chains from oth X U and X L. Thus the expectation of total numer of transitions of M is ounded y OE[8T ] = On 3 ln N. We can assume Condition y sorting parameters in On ln n time. 6 Conclusion In this paper, we proposed a monotone Markov chain whose stationary distriution is a discretized Dirichlet distriution. By employing Propp and Wilson s result, on monotone CFTP algorithm, we can construct a perfect sampling algorithm. We showed that our Markov chain is rapidly mixing y using path coupling method. Thus the rapidity implies that our perfect sampling algorithm is a polynomial time algorithm. The otained time complexity does not depend on the magnitudes of parameters. We can reduce the memory requirement y employing Wilson s read-once algorithm in [].

15 Appendix Lemma 5 is essentialy equivalent to the lemma appearing in Appendix section of the paper [6] y Matsui, Motoki and Kamatani which deals with an approximate sampler for discretized Dirichlet distriution. Lemma 6 is a new result otained in this paper. Lemma 5. {, 3,...}, u i, u j 0, k {,,..., }, g + k u i, u j g k u i, u j g + k u i, u j. Proof: In the following, we show the second inequality. We can show the first inequality in a similar way. We denote Cu i, u j, + = C + and Cu i, u j, = C for simplicity. From the definirion of g k u i, u j, we otain that Hk def. = g + k u i, u j g k u i, u j = k l= C +l ui l + uj k l= C l ui l uj = C + l=k+ lui l + uj C l=k lui l uj = C l=k+ l ui l + uj C + l=k+ lui l + uj = l=k+ C l ui l + uj C + l ui l + uj = l=k+ C l ui l + uj ui C l + C. Similarly, we can also show that Hk = C + k l= lui l + uj C k l= lui l uj C + k l= lui l + uj C k l= l ui l + uj = k l= C +l ui l + uj C l ui l + uj = k l= C l ui l + uj C+ C l ui. By introducing the function h : {, 3,..., } R defined y hl def. = ui l C + C, we have the following equality and inequality Hk = l=k+ C l ui l + uj hl k l= C l ui l + uj hl. 3 a Consider the case that u i. Since u i 0, the function hl is monotone non-decreasing. When hk 0 holds, we have 0 hk hk + h, and so implies the nonnegativity Hk 0. If hk < 0, then inequalities h h3 hk < 0 hold, and so 3 implies that Hk k l= C l ui l + uj hl 0. 3

16 Consider the case that 0 u i. Since u i 0, the function hl is monotone non-increasing. If the inequality h 0 hold, we have h h3 h 0 and inequality implies the non-negativity Hk 0. Thus, we only need to show that h = ui C + C 0. In the rest of this proof, we sustitute u i y α i for all i. We define a function H 0, α i, α j y H 0, α i, α j def. = αi C + αi C. It is clear that if the condition [ α i 0, α j, {, 3, 4,...}, H 0, α i, α j 0] holds, we oatin the required result that h 0 for each {, 3, 4,...}. Now we transform the function H 0, α i, α j and otain another expression as follows; H 0, α i, α j = αi k αi k + αj αi k αi k αj k= = k= αi k αi k + = k= [ αi k αi k + αj = α i k α i k α j k= αj k+k k αi k αi k αj ] [ + k k= αi k= kαi k αj + αi k + αi k αj αj k + + k αi k k αi ]. Then it is enough to show that the function H, α i, α j, k def. = + αj αi k k + + αi k k is nonnegative for any k {,,..., }. Since + / k > and α j, we have H, α i, α j, k H, α i,, k = k k+ + + αi αi k k. We differentiate the function H y α i, and otain the following H, α i,, k = + αi αi α k k log + k log i = + k αi log + k k + αi log +. Since k, is a pair of positive integers satisfying k, the non-positivity of α i implies 0 + /k αi + / αi and 0 log + /k k log + /. Thus the function H, α i,, k is monotone non-decreasing with respect to α i 0. Thus we have H, α i,, k H, 0,, k = k k k k = k k+ + k + = k + k k+ = k+ 0. 4

17 Lemma 6. {, 3,...}, u i u j, [g k u i, u j g + k u i, u j ] [g + k u i, u j g k u i, u j ] 0. k= k= Proof: We denote Cu i, u j, + = C + and Cu i, u j, = C for simplicity. It is not difficult to show the following equalities, G def. = k= [g k u i, u j g + k u i, u j ] k= [g +k u i, u j g k u i, u j ] = k= g k u i, u j k= g +k u i, u j k= g +k u i, u j + k= g k u i, u j = k= g k u i, u j k= g +k u i, u j k= g +k u i, u j + k= g k u i, u j = k= C k= C + k l= lui l uj k= C + k l= lui l + uj + k= C k l= lui l + uj k l= lui l uj = C l= llui l uj C + l= llui l + uj C + l= llui l + uj + C l= l lui l uj = C l= l lui l uj C + l= llui l + uj = C C + C + l= l lui l uj C l= llui l + uj = C C + k= kui k + uj l= l lui l uj k= kui k uj l= llui l + uj = C C + k= l= l klui k + l uj k= l= lklui k l + uj = C C + k= l= l klui k + l uj k= = C C + k= = C C + l= k= + k= = C C + k= + k= l= kklui l k + uj k l kl u i uj k + l uj l= k l klui k + l l= k l klui k + l uj l= k l klui k + l uj l= k + l k + l ui uj k + + l 5

18 = C C + k= = C C + = C C + = C C + k= l= uj l= k l klui k + l l= k l k + lui kl uj k 3 l= k=l+ l 3 : k= l=k k=l+ k l klui k + l uj + k= l=k k l klui k + l uj l= k=l+ k l k + lui kl uj k= l=k k l k + lui kl uj k=l+ k l klui k + l uj l= l= l= + l= l= k=l+ k=l+ k l lk ui l + k + uj k=l+ k l k + lui kl uj k=l+ k l l + k + ui lk uj k l kl ui k + l uj lk ui l + k + uj k + l ui kl uj + l + k + ui lk uj. We define a function G 0 k, l, u i, u j y kl ui uj k + l G 0 k, l, u i, u j def. = lk ui l + k + k + l ui kl uj + l + k + ui lk uj uj Since l < l+ k, it is clear that k l > 0. Thus, we only need to show that l {,,..., }, k {, 3,..., }, u i u j, G 0 l, k, u i, u j 0. It is easy to see that G 0 k, l, u i, u j = lk ui k + uj + k ui l uj l + uj. + k + l ui l k uj uj + + l ui k uj Then it is clear that G 0 k, l, u i, u j is non-decreasing with respect to u i and so G 0 k, l, u i, u j G 0 k, l, u j, u j. By sustituting u i y u j in the definition. 6

19 of G 0 k, l, u i, u j, it is easy to see that G 0 k, l, u j, u j = 0. Thus we have the desired result. References. Buley, R., M. Dyer, M.: Path coupling: A technique for proving rapid mixing in Markov chains, 38th Annual Symposium on Foundations of Computer Science, IEEE, San Alimitos, 997, Buley, R.: Randomized Algorithms : Approximation, Generation, and Counting, Springer-Verlag, New York, Dimakos, X. K.: A guide to exact simulation, International Statistical Review, 69 00, pp Durin, R., Eddy, R., Krogh, A., Mitchison, G.: Biological sequence analysis: proailistic models of proteins and nucleic acids, Camridge Univ. Press, Kijima, S. and Matsui, T.,: Polynomial Time Perfect Sampling Algorithm for Two-rowed Contingency Tales, METR 003-5, Mathematical Engineering Technical Reports, University of Tokyo, 003. availale from 6. Matsui, T., Motoki, M., and Kamatani, N.,: Polynomial Time Approximate Sampler for Discretized Dirichlet Distriution, METR 003-0, Mathematical Engineering Technical Reports, University of Tokyo, 003. availale from 7. Niu, T., Qin, Z. S., Xu, X., Liu, J. S.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms, Am. J. Hum. Genet., Pritchard, J. K., Stephens, M., Donnely, P.: Inference of population structure using multilocus genotype data, Genetics, Propp, J. and Wilson, D.: Exact sampling with coupled Markov chains and applications to statistical mechanics, Random Structures and Algorithms, 9 996, pp Propp, J. and Wilson, D.: How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph, J. Algorithms, 7 998, pp Roert, C. P.: The Bayesian Choice, Springer-Verlag, New York, 00.. Wilson, D.: How to couple from the past using a read-once source of randomness, Random Structures and Algorithms, 6 000, pp

APPROXIMATE/PERFECT SAMPLERS FOR CLOSED JACKSON NETWORKS. Shuji Kijima Tomomi Matsui

Proceedings of the 005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, eds. APPROXIMATE/PERFECT SAMPLERS FOR CLOSED JACKSON NETWORKS Shuji Kijima Tomomi Matsui