Detailed Proofs of Lemmas, Theorems, and Corollaries

Size: px

Start display at page:

Download "Detailed Proofs of Lemmas, Theorems, and Corollaries"

Miranda Edwards
5 years ago
Views:

1 Dahua Lin CSAIL, MIT John Fisher CSAIL, MIT A List of Lemmas, Theorems, and Corollaries For being self-contained, we list here all the lemmas, theorems, and corollaries in the main paper. Lemma. The joint transition matrix P + given by Eq.() in the main paper has a stationary distribution in form of (αµ X, βµ Y ), if and only if µ X Q B µ Y, and µ Y Q F µ X. () Under this condition, we have αb βf. Further, if both P X and P Y are both reversible, then P + is also reversible, if and only if µ X (x)q B (x, y) µ Y (y)q F (y, x), (2) for all x X and y Y. Lemma 2. If the condition given above holds, then µ X and µ Y are respectively stationary distributions of Q BF and Q F B. Moreover, if P + is reversible, then both Q BF and Q F B are reversible. Theorem. Given a reversible Markov chain with transition matrix P, and ε (0, /2), then log(/(2ε))(τ ) t mix (ε) log(/(εµ min ))τ. (3) Here, τ is called the relaxation time, given by /γ (P). Theorem 2. Let λ 2 be the second largest eigenvalue of a reversible transition matrix P, then Φ 2 (P)/2 λ 2 2Φ (P). (4) Theorem 3. The reversible transition matrix P + as given by Eq.() in the main paper has η(b, f) 2 φ φ + Φ (P + ) max{b, f}. (5) Here, η(b, f) 2αb 2βf is the total probability of cross-space transition, φ min{φ (Q BF ), Φ (Q F B )}. Corollary. The joint chain P + is ergodic when the collapsed chains (Q BF and Q F B ) are both ergodic. Lemma 3. Let P be a reversible transition matrix over X, such that P(x, x) ξ > 0 for each x X then its smallest eigenvalue λ n has λ n 2ξ. Theorem 4. The hierarchically bridging Markov chain with b k < for k 0,...,, and f k < for k,..., is ergodic. If we write the equilibrium distribution in form of (αµ 0, β µ,..., β µ ), then (S) µ 0 equals the target distribution µ; (S2) for each k, and y Y k, µ k (y) is proportional to the total probability of its descendant target states (the target states derived by filling all its placeholders); (S3) α, the probability of being at the target level, is given by α + k (b 0 b k )/(f f k ). Corollary 2. If b k /f k+ κ < for each k,...,, then α > κ. B Proofs Here, we provide the proofs of the lemmas and theorems presented in the paper. B. Proof of Lemma Recall that P + is given by [ ] ( b)px bq P + B. fq F ( f)p Y Suppose P + has a stationary distribution in form of (αµ X, βµ Y ), then α( b)µ X P X + βfµ Y Q F αµ X, αbµ X Q B + β( f)µ Y P Y βµ Y. (6) Since µ X and µ Y are respectively stationary distributions of P X and P Y, i.e. µ X P X µ X and µ Y P Y µ Y, we have βfµ Y Q F αbµ X, and αbµ X Q B βfµ Y. (7) Note that Q B and Q F were defined with the condition that each of their rows sums to, i.e. Q F X Y and Q B Y X. Multiplying to the right of both hand sides of either equation results in αb βf. It immediately follows that µ X Q B µ Y, and µ Y Q F µ X. (8) For the other direction, we assume Q B and Q atisfy the conditions above, and αb βf. Plugging these

2 conditions into the left hand sides of the equations in Eq.(6) results in (αµ X, βµ Y )P + (αµ X, βµ Y ), which implies that (αµ X, βµ Y ) is a stationary distribution of P +. The proof of the first part is done. Next, we show the second part of the lemma, which is about the reversibility. Let µ + (αµ X, βµ Y ). Under the condition that P X and P Y are both reversible, we have for each x, x X, µ + (x)p + (x, x ) α( b)µ X (x)p X (x, x ), µ + (x )P + (x, x) α( b)µ X (x )P X (x, x), (9) and thus µ + (x)p + (x, x ) µ + (x )P + (x, x) (due to the reversibility of P X ). Likewise, we can get µ + P + (y, y ) µ + (y )P(y, y). Hence, P + is reversible if and only if µ + (x)p + (x, y) µ + (y)p + (y, x) for each x X and y Y. This can be expanded as αbµ X (x)q B (x, y) βfµ Y (y)q F (y, x), (0) which holds if and only if µ X (x)q B (x, y) µ Y (y)q F (y, x) (under the condition αb βf). The proof is completed. B.2 Proof of Lemma 2 If (αµ X, βµ Y ) is a stationary distribution of P +, then Eq.(8) holds. Thus, µ X Q B Q F µ Y Q F µ X, () implying that µ X is a stationary distribution of Q B Q F. Similarly, we can show that µ Y is a stationary distribution of Q F Q B. Furthermore, if P + is reversible, according to Lemma, we have µ X (x)q B (x, y) µ Y (y)q F (y, x) for each x X and y Y. Then for any x, x X, µ X (x)q BF (x, x ) µ X (x) y Y Q B (x, y)q F (y, x ) Similarly, we can get y Y µ X (x)q B (x, y)q F (y, x ) y Y µ Y (y)q F (y, x)q F (y, x ) µ X (x )Q BF (x, x) y Y µ Y (y)q F (y, x )Q F (y, x). Hence, µ X (x)q BF (x, x ) µ X (x )Q BF (x, x). (2) This implies that Q BF is reversible. Likewise, we can show that Q F B is reversible under the condition that P + is reversible. The proof is completed. B.3 Proof of Theorem 3 To prove theorem 3, we first establish a lemma on flow decomposition, and then accomplish the proof based on the lemma. B.3. A Lemma on Flow Decomposition For the joint chain P +, we analyze its bottleneck ratio by decomposing the flows. Consider a partition of the union space X Y into two parts: A B (with A X and B Y ) and A c B c (with A c X/A and B c Y/B). The flow between them comprises three parts: F(A, A c ) + F(B, B c ) + (F(A, B c ) + F(B, A c )). Here, F(A, A c ) is the flow within X, F(B, B c ) is the flow within Y, and F(A, B c ) + F(B, A c ) is the flow between X and Y. The first two are inherited from the original Markov chains. We focus on the third one, which reflects the effect of bridging. For this part of flow, we derive the following lemma by decomposing it along multiple paths. Lemma 4. Given arbitrary partition of X Y into A B and A c B c as described above, we have F(A, B c ) + F(B, A c ) αb Φ (Q BF )µ X (A), (3) when µ X (A) µ X (A c ), and F(A, B c ) + F(B, A c ) βf Φ (Q F B )µ Y (B), (4) when µ Y (B) µ Y (B c ). Proof. To analyze the flow F(A, B c ) + F(B, A c ), we further decompose it along multiple paths. Based on Eq.() in the main paper, we have F(A, B c ) αbµ X (x)q B (x, y) (5) x A y B c αb µ X (x)q B (x, y) Q F (y, x ) x A y B c x X In this way, we decompose the flow into a sum of the terms in form of µ X (x)q B (x, y)q F (y, x ), which we call the path weight along x y x, denoted by ω(x, y, x ). We can then rewrite F(A, B) as F(A, B) αb ω(x, y, x ). (6) x A y B x X For conciseness, we use ω(a, B, C) to denote the sum of paths traveling from A, via B, and ending up in C, i.e. x A y B x C ω(x, y, x ). Then, we have F(A, B c ) αb(ω(a, B c, A) + ω(a, B c, A c )), (7) F(A c, B) αb(ω(a c, B, A) + ω(a c, B, A c )). (8)

3 Dahua Lin, John Fisher As F is symmetric for a reversible chain, we have F(B, A c ) F(A c, B), and thus F(A, B c ) + F(B, A c ) αb(ω(a, B c, A c ) + ω(a c, B, A)) αb ω(a, Y, A c ). (9) On the other hand, we note that ω(a, Y, A c ) ω(x, y, x ) x A y Y x A c µ X (x)q B (x, y)q F (y, x ) x A y Y x A c µ X (x) Q B (x, y)q F (y, x ) x A x A c y Y µ X (x)q BF (x, x ). (20) x A x A c This is exactly the flow from A to A c with respect to the collapsed chain Q BF, i.e. ω(a, Y, A c ) F QBF (A, A c ). (2) Assuming µ X (A) µ X (A c ), we have F QBF (A, A c ) Φ (Q BF )µ(a), by the definition of bottleneck ratio. Combining this with Eq.(9) results in F(A, B c ) + F(B, A c ) αb ω(a, Y, A c ) αb Φ (Q BF )µ X (A). (22) Likewise, with the assumption µ Y (B) µ Y (B c ), we have F(A, B c ) + F(B, A c ) βf Φ (Q F B )µ Y (B). (23) The proof of the lemma is completed. B.3.2 The Proof of the Main Theorem Let µ + (αµ X, βµ Y ) be the stationary distribution of P +. For conciseness, we let (A, B) F(A B, A c B c ). When A and B are clear from the context, we simply write. Then the bottleneck ratio of P + is the minimum of the values of (A, B)/µ + (A B), among all possible choices of A X and B Y such that µ + (A B) /2 and A B. Throughout this proof, we assume A X, B Y, and µ + (A B) /2, i.e. αµ X (A) + βµ Y (B) /2. Under this assumption, there are three cases, which we respectively discuss as follows. Case. µ X (A) /2 and µ Y (B) /2. We have φ min{φ (Q BF ), Φ (Q F B )} in the theorem. In addition, F(A, B c ) + F(B, A c ). Combining this with Lemma 4, we get αb φµ X (A), and βf φµ Y (B). (24) Note that η/2 αb βf and α + β. Thus µ + (A B) α + β ηφ/2. (25) αµ X (A) + βµ Y (B) Case 2. µ X (A) < /2 and µ Y (B) > /2. Given arbitrary κ > 2, there are two possibilities: Case 2.. /κ µ X (X) < /2 and µ Y (B) > /2. Then αb φµ X (A) > αbφ. (26) κ Recall that µ + (A B) /2. Thus µ + (A B) 2 ηφ κ 2. (27) Case 2.2. µ X (X) < /κ and µ Y (B) > /2. Here, we utilize the following fact: F(B, A c ) F(B, X) F(B, A). Then, by the definition of flow, we have F(B, X) βfµ Y (B) > βf/2, (28) and by the symmetry of F (due to reversibility), F(B, A) F(A, B) F(A, Y ) αb/κ. (29) With αb βf, combining the results above leads to > βf/2 αb/κ αb(/2 /κ). (30) As a result, we get µ + (A B) > 2αb(/2 /κ) η ( 2/κ). (3) 2 Case 3. µ X (A) > /2 and µ Y (B) < /2. Following a similar argument as we developed above for case 2, given κ > 2, we can likewise get µ + (A B) { 2 ηφ κ 2 (µ X (X) /κ), η 2 ( 2/κ) (µ X(X) < /κ). (32) Note that µ X (A) > /2 and µ Y (B) > /2 cannot hold simultaneously under the assumption αµ X (A) + βµ Y (B) /2. Integrating the results derived for all cases, we obtain (A, B) µ + (A, B) η 2 min { 2 κ φ, 2 }, κ > 2. (33) κ Note that this inequality holds for all A and B with 0 < µ + (A B) /2. In this way, we can get a series of lower bound of the bottleneck ratio, using different values of κ. And the supreme of these lower bounds remains a lower bound. It is easy to see that the supreme attains when 2φ/κ 2/κ, leading to sup min κ>2 { 2 κ φ, 2 κ } φ + φ. (34)

4 It follows that Φ (P + ) η 2 φ φ +. (35) This completes the proof of the lower bound. Next, we show the upper bound, which is easier. Due to the definition of bottleneck ratio, for any given partition of X Y, the flow ratio derived from that partition constitutes an upper bound of Φ (P + ). Here, we consider the partition with one part being X and the other being Y. Then and thus the flow ratio is given by αb βf, (36) max(f, b). (37) min(α, β) This gives an upper bound of Φ (P + ). The proof of the theorem is completed. B.4 Proof of Corollary When both Q BF and Q F B are ergodic, their bottleneck ratios are positive. According to Theorem 3, the bottleneck ratio of Φ (P + ) is thus positive, implying that the spectral gap of P + is positive (by Theorem 2), and hence P + is irreducible. Since Q BF is ergodic and thus aperiodic, the greatest common divisor of the loop lengths for each x X is. Similar argument applies to each y Y. Consequently, the joint chain characterized by P + is aperiodic. Being an irreducible and aperiodic finite Markov chain, the joint chain given by P + is ergodic. This completes the proof. B.5 Proof of Lemma 3 Let P (P ξi)/( ξ). Since P has P(x, x) > ξ for each x X, the entries of the matrix P are all non-negative. In addition, P (P ξi) ( ξ). (38) ξ ξ This implies that P is also a stochastic matrix. Since P is reversible, all its eigenvalues are real numbers. Without losing generality, we assume they are λ... λ n. As P is a stochastic matrix, we have λ and λ n. According to the spectral mapping theorem, the eigenvalues of P, denoted by λ,..., λ n, are given by λ i (λ i ξ)/( ξ), for each i,..., n. As P is a stochastic matrix, we have λ n, and thus λn ξ ξ. Therefore, λ n 2ξ. The proof is completed. B.6 Proof of Theorem 4 We show this theorem by progressively proving a series of claims as follows. Claim. The augmented Markov chain is ergodic. With b k > 0 for k 0,...,, the root is accessible from each state (including both complete and partial assignments). With f k > 0 for k,...,, each state is accessible from root. These imply that any two states are accessible from each other via the root. Therefore, the chain is irreducible. In addition, f < makes the chain aperiodic. As this is a finite Markov chain, we can conclude that it is ergodic. Since the chain is ergodic, it has a unique stationary distribution, i.e. its equilibrium distribution. Therefore, it suffices to show that (αµ 0, β µ,..., β µ ) that satisfies the three statements in the theorem is a stationary distribution. Claim 2. Given vectors µ 0,..., µ respectively over the set of states at level 0,...,, such that µ 0 µ is a distribution over X, and for each k,...,, µ k is defined recursively by µ k (y) µ (k ) k (x), for y Y k. (39) Then, µ k for each k,..., represents a distribution over Y k, and ( ) µ k (y), y Y k k. (40) Here, x y means that x is a descendant of y. The proof of this claim is done by induction as follows. Obviously, when k, according to the definition above, we have µ (y) µ X (x). (4) ( ) This satisfies Eq.(40), as, and it is clear that µ is non-negative. In addition, we have µ (y) y Y y Y y Y x X y P a(x). (42) Here, P a(x) is the set of parent states of x. In the derivation above, we use the fact that x has parents,

5 Dahua Lin, John Fisher i.e. P a(x). The identity above implies that µ is a valid distribution over Y. Therefore, the claim holds when k. Suppose that the claim holds for k,..., m with m <, we are to show that it also holds for k m+, so as to complete the induction. Note that µ m+ is defined as µ m+ (y) m µ m (x), for y Y m+. Again, µ m+ is obviously non-negative, and y Y m+ µ m+ (y) m m y Y m+ z Ch(y) z Y m µ m (z) µ m (z) y P a(x). (43) Similar to the derivation for k, here we apply the fact that P a(x) m for each x Y m. This shows that µ m+ is a valid distribution over Y m. Moreover, we have for each y Y m+, µ m+ (y) m z Ch(y) ( m)!m! m! ( ) m x X:x z z Ch(y) x X:x z ( m )!m! (m + )! ( ). (44) m + By induction, we can conclude that the claim holds for each k,...,. Claim 3. When the construction is done up to the k-th level, the distribution µ + k (c k,0µ 0,..., c k,k µ k ) is a stationary of the augmented Markov chain. Here, µ 0,..., µ k are given by Claim 2, and c k,0,..., c k,k is defined such that for each k 0,..., k claim holds for k 0,..., m with m <, we are to show that it also holds for k m +. Note from Eq.(45) that c m+,k c m,k Z m Z m+, k 0,..., m, (47) Hence, showing the claim holds for k m+ is equivalent to showing that (αµ + m, βµ m+ ) is a stationary distribution of the augmented chain (up to (m + )-th level), with α Z m, (48) Z m+ and β Z m+ Z m b 0 b m. (49) Z m+ Z m+ f f m+ According to Lemma, it suffices to check that this distribution satisfies the cross-space detailed balance given in Eq.(2), which is not difficult to verify based on the construction described in section 3.2. In this proof, Claim proves the ergodicity of the joint chain. Claim 2 constructs a set of vectors µ 0,..., µ, and states that they are valid distributions over X, Y 0,..., Y, and satisfy the properties given in (S2). Claim 3 (induction up to k ) shows that the distributions constructed in Claim 2 is exactly a stationary distribution of the joint chain. Since the chain is ergodic, this is the equilibrium distribution. As a by product, Claim 3 also shows the the statement (S3) of the theorem. For (S), it is automatically established by the construction described in Claim 2. Therefore, we can conclude that the proof of the theorem has been completed. B.7 Proof of Corollary 2 Based on Theorem 4, we have α + k b 0 b k f f k + k Hence, α κ. The proof is done. κ k κ. (50) with c k,0 Z k, Z k + c k,l Z k b 0 b l f f l, (45) k l b 0 b l f f l. (46) We are going to show this claim by induction. Note that µ 0 µ over X is a stationary distribution of P. And µ + 0 c 0,0p 0, thus c 0,0. It immediately follows that the claim is true for k 0. Suppose this

Some Definition and Example of Markov Chain

Some Definition and Example of Markov Chain Bowen Dai The Ohio State University April 5 th 2016 Introduction Definition and Notation Simple example of Markov Chain Aim Have some taste of Markov Chain and