Markov Chains, Conductance and Mixing Time

Similar documents
Geometric Random Walks: A Survey

Geometric Random Walks: A Survey

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

MAT 257, Handout 13: December 5-7, 2011.

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Principles of Real Analysis I Fall I. The Real Number System

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

Mid Term-1 : Practice problems

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

2 Sequences, Continuity, and Limits

Tools from Lebesgue integration

HW 4 SOLUTIONS. , x + x x 1 ) 2

POL502: Foundations. Kosuke Imai Department of Politics, Princeton University. October 10, 2005

Metric Spaces and Topology

Löwenheim-Skolem Theorems, Countable Approximations, and L ω. David W. Kueker (Lecture Notes, Fall 2007)

Lecture Notes on Metric Spaces

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

106 CHAPTER 3. TOPOLOGY OF THE REAL LINE. 2. The set of limit points of a set S is denoted L (S)

Completeness of Star-Continuity

Convergence of Random Walks and Conductance - Draft

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.

LECTURE 3 RANDOM VARIABLES, CUMULATIVE DISTRIBUTION FUNCTIONS (CDFs)

1 Lesson 1: Brunn Minkowski Inequality

25.1 Markov Chain Monte Carlo (MCMC)

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Lecture 2: September 8

Lecture 6. s S} is a ring.

1 Stat 605. Homework I. Due Feb. 1, 2011

LECTURE 15: COMPLETENESS AND CONVEXITY

Stochastic processes and algorithms

Metric spaces and metrizability

4 Countability axioms

Online Convex Optimization

Lecture 4 Lebesgue spaces and inequalities

Math212a1413 The Lebesgue integral.

The Caratheodory Construction of Measures

Math 541 Fall 2008 Connectivity Transition from Math 453/503 to Math 541 Ross E. Staffeldt-August 2008

7 Convergence in R d and in Metric Spaces

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

1 Strict local optimality in unconstrained optimization

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Continuity. Chapter 4

Lebesgue Measurable Sets

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

ECE 4400:693 - Information Theory

Functional Analysis HW #3

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

Topological vectorspaces

Basic Definitions: Indexed Collections and Random Functions


MATH 217A HOMEWORK. P (A i A j ). First, the basis case. We make a union disjoint as follows: P (A B) = P (A) + P (A c B)

Integration on Measure Spaces

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

REAL AND COMPLEX ANALYSIS

Filters in Analysis and Topology

6.842 Randomness and Computation March 3, Lecture 8

Econ Slides from Lecture 1

Show Your Work! Point values are in square brackets. There are 35 points possible. Some facts about sets are on the last page.

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Graph coloring, perfect graphs

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Appendix A. Definitions for Ordered Sets. The appendices contain all the formal definitions, propositions and proofs for

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.

Spectral Theory, with an Introduction to Operator Means. William L. Green

ABSTRACT INTEGRATION CHAPTER ONE

ALGEBRAIC STRUCTURE AND DEGREE REDUCTION

Geometric intuition: from Hölder spaces to the Calderón-Zygmund estimate

Lecture 2: A crash course in Real Analysis

i c Robert C. Gunning

Real Analysis on Metric Spaces

Lecture 21 Representations of Martingales

A LITTLE REAL ANALYSIS AND TOPOLOGY

LECTURE 6. CONTINUOUS FUNCTIONS AND BASIC TOPOLOGICAL NOTIONS

Department of Mathematics, University of Wisconsin-Madison Math 114 Worksheet Sections 3.1, 3.3, and 3.5

Model Counting for Logical Theories

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

Some Background Material

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

1 Functions of Several Variables 2019 v2

Impagliazzo s Hardcore Lemma

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Holistic Convergence of Random Walks

Differentiation. Chapter 4. Consider the set E = [0, that E [0, b] = b 2

Chapter 1 Preliminaries

Preliminary draft only: please check for final version

Measure and integration

NAME: Mathematics 205A, Fall 2008, Final Examination. Answer Key

Continuity. Chapter 4

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

2019 Spring MATH2060A Mathematical Analysis II 1

Supremum and Infimum

SPACES: FROM ANALYSIS TO GEOMETRY AND BACK

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

Spectral Continuity Properties of Graph Laplacians

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

Section 21. The Metric Topology (Continued)

A D VA N C E D P R O B A B I L - I T Y

Transcription:

Markov Chains, Conductance and Mixing Time Lecturer: Santosh Vempala Scribe: Kevin Zatloukal Markov Chains A random walk is a Markov chain. As before, we have a state space Ω. We also have a set of subsets A of Ω that form a σ-algebra, i.e., A is closed under complements and countable unions. It is an immediate consequence that A is closed under countable intersections and that, Ω A. For any u Ω and A A, we have the one-step probability, P u (A), which tells us the probability of being in A after taking one step from u. Lastly, we have a starting distribution Q 0 on Ω, which gives us a probability Q 0 (A) of starting in the set A A. With this setup, a Markov chain is a sequence of points w 0, w, w,... such that P(w 0 A) = Q 0 (A) and P(w i+ A w 0 = u 0,..., w i = u i ) = P(w i+ A w i = u i ) = P ui (A), for any A A. A distribution Q is called stationary if, for all A A, Q(A) = P u (A) dq(u). In other words, Q is stationary if the probability of being in A is the same after one step. We also have a generalized version of the symmetry we saw above in the transition probabilities (p xy = p yx ). The Markov chain is called time-reversible if, for all A, B A, Ball Walk P(w i+ B w i A) = P(w i+ A w i B). The following algorithm, called Ball-Walk, is a continuous random walk. In this case, the set corresponding to the neighborhood of x is B δ (x) = x+δb n. Algorithm Ball-Walk(δ):. Let x be a starting point in K.. Repeat sufficiently many times: Choose a random y B δ (x). If y K, set x = y.

Here, the state space is all of the set, so Ω = K. The σ-algebra is the measurable subsets of K, as is usual. We can define a density function for the probability of transitioning from u K to v, provided that u v: p(u, v) = { The probability of staying at u is if v K B δ (u), 0 otherwise. P u (u) = Vol(K B δ(u)). Vol(δB n ) Putting these together, the probability of transitioning from u to any measurable subset A is { Vol(A K Bδ (u)) Vol(δB P u (A) = n) + P u (u) if u A Vol(A K B δ (u)) if u / A. Since the density function is symmetric, it is easy to see that the uniform distribution is stationary. We can also verify this directly. We can compute the probability of being in A after one step by adding up the probability that we transition to u for each u A. Thus, after one step from the uniform distribution dq(u) = Vol(K) du, the probability of being in A is P u(u) dq(u) + v K B δ (u)\{u} dq(v) du = ( Vol(K B δ(u)) = dq(u) + = dq(u) = Q(A) ) dq(u) + v K B δ (u) dq(v) du ( ) v K B δ (u) dv Vol(K B δ(u) dq(u) Another way to look at the one-step probability distributions is in terms of flow. For any subset A A, the ergodic flow Φ(A) is the probability of transitioning from A to Ω \ A, i.e., Φ(A) = P u (Ω \ A) dq. Intuitively, in a stationary distribution, we should have Φ(A) = Φ(Ω \ A). In fact, this is a characterization of the stationary distribution.

Theorem. A distribution Q is stationary iff Φ(A) = Φ(Ω \ A) for all A A. Proof. Consider their difference Φ(A) Φ(Ω \ A) = P u(ω \ A) dq(u) u/ A P u(a) dq(u) = ( P u(a)) dq(u) u/ A P u(a) dq(u) = Q(A) P u(a) dq(u). The latter quantity is the probability of staying in A after one step, so Φ(A) Φ(Ω \ A) = 0 iff Q is stationary. Now, let s return to the question of how quickly (and whether) this random walk converges to the stationary distribution. As before, it is convenient to make the walk lazy by giving probability of staying in place instead of taking a step. This means that P u ({u}) and, more generally, for any A A, P u (A) if u A and P u(a) < if u / A. Also as before, the notion of conductance will be useful. In this general context, we define conductance as Φ(A) φ = Q(A), min A A, Q(A) where Q is the stationary distribution. This is the probability of transitioning to Ω \ A given that we are starting in A. We must also generalize the notion of the distance of a distribution from stationary. The straightforward generalization of our previous definition is Q t Q = dq t (u) dq(u) = dq t (u) dq(u) + dq(u) dq t (u), where Q + t = {u Ω dq t (u) dq(u)}, and Q t = {u Ω dq t (u) < dq(u)}. Since we know that = dq t (u) + dq t (u) = dq(u) + dq(u), 3

we can rearrange to get dq t (u) dq(u) = dq(u) dq t (u), which shows that the two terms from above are equal. Therefore, we can just as well define Q t Q = dq t (u) dq(u) = sup Q t (A) Q(A). A A We will take this as our definition of the distance d(q t, Q) between distributions Q t and Q. We would like to know how large t must be before d(q t, Q) d(q 0, Q). This is one definition of the mixing time. It would be nice to get a bound by showing that Q t (A) Q(A) drops quickly for every A. This is not the case. Instead, we can look at sup Q(A)=x Q t (A) Q(A) for each fixed x [0, ]. A bound for every x would imply the bound we need. To prove a bound on this by induction, we will define it in a formally weaker way. Our upper bound is h t (x) = sup g F x g(u) (dq t (u) dq(u)) = sup g(u) dq t (u) x, g F x where F x is the set of functions { F x = g : Ω [0, ] : } g(u) dq(u) = x. It is clear that h t (x) is an upper bound on sup Q(A)=x Q t (A) Q(A) since g could be the characteristic of A. The following lemma shows that these two quantities are in fact equal as long as Q is atom-free, i.e., there is no x Ω such that Q({x}) > 0. Lemma. If Q is atom-free, then h t (x) = sup Q(A)=x Q t (A) Q(A). Proof (Sketch). Consider the set of points X that maximize dq t /dq, their value density. (This part would not be possible of Q had an atom.) Put g(x) = for all x X. These points give the maximum payoff per unit of weight from g, so it is optimal to put as much weight on them as possible. Now, find the set of maximizing points in Ω \ X. Set g(x) = at these points. Continue until the set of points with g(x) = has measure x. 4

In fact, this argument shows that when Q is atom-free, we can find a set A that achieves the supremum. When Q has atoms, we can include the high value atoms and use this procedure on the non-atom subset of Ω; however, we made to include a fraction of one atom to achieve the supremum. This shows that h t (x) can be achieved by a function g that is 0 valued everywhere except for at most one point. Another important fact about h t is the following. Lemma 3. The function h t is concave. Proof. Let g F x and g F y. Then, we can see that αg + ( α)g F αx+( α)y, which implies that h t (αx + ( α)y) αg (u) + ( α)g (u) dq t (u) (αx + ( α)y) = α( g (u) dq t (u) x) + ( α)( g (u) dq t (u) y). Since this holds for any such g and g, it must hold if we take the sup, which gives us h t (αx + ( α)y) αh t (x) + ( α)h t (y). Thus, h t is concave. Now, we come to the main lemma relating h t to h t. This will allow us to put a bound on d(q t, Q). Lemma 4. Let Q be atom-free, and y = min{x, x}. Then h t (x) h t (x φy) + h t (x + φy). Proof. Assume that y = x, i.e., x. The other part is similar. We will construct two functions, g and g, and use these to bound h t (x). Let A A be a subset to be chosen later with Q(A) = x. Let g (u) = { P u (A) if u A, 0 if u / A, and g (u) = { if u A, P u (A) if u / A. First, note that ( g + g )(u) = P u (A) for all u Ω, which means that g (u) dq t (u) + g (u) dq t (u) = ( g (u) + g (u)) dq t (u) = P u (A) dq t (u) = Q t (A) 5

On the other hand, g (u) dq(u) + g (u) dq(u) = P u (A) dq(u) = Q(A) since Q is stationary. Hence ( g + g ) F x.putting these together, we have g (u) (dq t (u) dq(u)) + g (u) (dq t (u) dq(u)) = Q t (A) Q(A). Since Q is atom-free, there is a subset A Ω such that h t (x) = Q t (A) Q(A) = g (u) (dq t (u) dq(u)) + g (u) (dq t (u) dq(u)) h t (x ) + h t (x ), where x = g (u) dq(u) and x = g (u) dq(u). Now, we know that x + x = x. Specifically, we can see that x = g (u) dq(u) = P u (A) dq(u) dq(u) = ( P u (Ω \ A)) dq(u) x = x P u (Ω \ A) dq(u) = x Φ(A) x φx = x( φ). This implies that x x( + φ). Since h t is concave, the chord from x to x on h t lies below the chord from x( φ) to x( + φ). (See Figure.) Therefore, h t (x) h t (x( φ)) + h t (x( + φ)). 6

h t- h t 0 x x(- ) x x(+ ) x Figure : The value of ht(x) lies below the line between ht (x) and ht (x). Since x x( φ) x x( + φ) x and ht is concave, this implies that ht(x) lies below the line between ht (x( φ)) and ht (x( + φ)). Given some information about Q0, this lemma allows us to put bounds on the rate of convergence to the stationary distribution. Theorem 5. Let 0 s and C0 and C be such that h0(x) C0 + C min{ x s, x s}. Then ht(x) C0 + C min{ x s, )t x s} ( φ. Proof. We argue by induction on t for s = 0. The inequality is true for t = 0 by our hypothesis. Now, suppose that x and the inequality holds for all values less than t. From the lemma, we know that ht(x) h t (x( φ)) + h t (x( + φ)) C0 + C ( x( φ) + x( + φ))( φ ) t = C0 + C x( φ + + φ)( φ ) t C0 + C x( φ ) t, provided that φ + + φ ( φ ). The latter inequality can be verified by squaring both sides, rearranging, and squaring again. 7

Corollary 6. If M = sup A A Q 0 (A)/Q(A), then we have d(q t, Q) M( φ ) t. Proof. By the definition of M, we know that h 0 (x) min{mx, } x min{mx, }. Next, we will show that min{mx, } Mx. If Mx = min{mx, }, then Mx, which implies that Mx Mx. If = min{mx, }, then Mx, which implies that Mx Mx. So we have shown that h 0 (x) min{mx, } Mx. Thus, by the last theorem, we know that d(q t, Q) max h t (x) M( x φ ) t. 8