Mixing time for a random walk on a ring

Similar documents
Characterization of cutoff for reversible Markov chains

The cutoff phenomenon for random walk on random directed graphs

Random Walks on Finite Quantum Groups

Characterization of cutoff for reversible Markov chains

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

6.842 Randomness and Computation March 3, Lecture 8

Lecture 10: Powers of Matrices, Difference Equations

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

STA205 Probability: Week 8 R. Wolpert

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.

CUT-OFF FOR QUANTUM RANDOM WALKS. 1. Convergence of quantum random walks

ABSTRACT MARKOV CHAINS, RANDOM WALKS, AND CARD SHUFFLING. Nolan Outlaw. May 2015

STA 294: Stochastic Processes & Bayesian Nonparametrics

Stochastic Processes

The coupling method - Simons Counting Complexity Bootcamp, 2016

TOTAL VARIATION CUTOFF IN BIRTH-AND-DEATH CHAINS

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Mixing times via super-fast coupling

Diaconis Shahshahani Upper Bound Lemma for Finite Quantum Groups

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Propp-Wilson Algorithm (and sampling the Ising model)

Statistics & Data Sciences: First Year Prelim Exam May 2018

On random walks. i=1 U i, where x Z is the starting point of

Oberwolfach workshop on Combinatorics and Probability

1 Continuous-time chains, finite state space

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

Consistency of the maximum likelihood estimator for general hidden Markov models

Lecture 20: Reversible Processes and Queues

MARKOV CHAINS AND MIXING TIMES

Chapter 7. Markov chain background. 7.1 Finite state space

18.175: Lecture 30 Markov chains

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

The Transition Probability Function P ij (t)

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

MARKOV CHAINS AND HIDDEN MARKOV MODELS

Weak quenched limiting distributions of a one-dimensional random walk in a random environment

Some Definition and Example of Markov Chain

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lower Tail Probabilities and Normal Comparison Inequalities. In Memory of Wenbo V. Li s Contributions

Strong approximation for additive functionals of geometrically ergodic Markov chains

Mixing Times and Hitting Times

Glauber Dynamics for Ising Model I AMS Short Course

Convergence Rate of Markov Chains

Random Times and Their Properties

Markov Chain BSDEs and risk averse networks

The Theory behind PageRank

18.600: Lecture 32 Markov Chains

Uniformly Uniformly-ergodic Markov chains and BSDEs

Ergodic Properties of Markov Processes

Asymptotic properties of the maximum likelihood estimator for a ballistic random walk in a random environment

Markov Processes Hamid R. Rabiee

88 CONTINUOUS MARKOV CHAINS

A mathematical framework for Exact Milestoning

Birth and Death Processes. Birth and Death Processes. Linear Growth with Immigration. Limiting Behaviour for Birth and Death Processes

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

FOURIER TRANSFORMS OF STATIONARY PROCESSES 1. k=1

Coupling. 2/3/2010 and 2/5/2010

Functions based on sin ( π. and cos

Algebraic Actions of the Discrete Heisenberg Group

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

Invariant measure for random walks on ergodic environments on a strip

SMSTC (2007/08) Probability.

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Disjointness and Additivity

A Note on the Central Limit Theorem for a Class of Linear Systems 1

Detailed Proofs of Lemmas, Theorems, and Corollaries

Introduction to Markov Chains and Riffle Shuffling

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

25.1 Markov Chain Monte Carlo (MCMC)

SPECIAL CASES OF THE CLASS NUMBER FORMULA

Convolutions and Their Tails. Anirban DasGupta

An almost sure invariance principle for additive functionals of Markov chains

Solutions For Stochastic Process Final Exam

1 Math 241A-B Homework Problem List for F2015 and W2016

Central Limit Theorem for Non-stationary Markov Chains

G(t) := i. G(t) = 1 + e λut (1) u=2

IEOR 6711, HMWK 5, Professor Sigman

Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models

18.440: Lecture 33 Markov Chains

Regular Variation and Extreme Events for Stochastic Processes

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

LECTURE 3. Last time:

Limit theorems for dependent regularly varying functions of Markov chains

Practical conditions on Markov chains for weak convergence of tail empirical processes

Markov Chain Monte Carlo (MCMC)

MATH 423/ Note that the algebraic operations on the right hand side are vector subtraction and scalar multiplication.

THE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE. 1. Introduction

Approximation of the conditional number of exceedances

5 Birkhoff s Ergodic Theorem

RENEWAL THEORY STEVEN P. LALLEY UNIVERSITY OF CHICAGO. X i

Sensitivity of mixing times in Eulerian digraphs

Lecture 10: Semi-Markov Type Processes

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

Lecture 28: April 26

Some Results on the Ergodicity of Adaptive MCMC Algorithms

1 Sequences of events and their limits

Lecture 9 Classification of States

Transcription:

Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013

Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix P = (p ij ), where p ij = P (X t+1 = j X t = i). The vector π is a stationary distribution for X if π = πp. If X is ergodic then we know that P (X t = j X 0 = i) π j as t, for all i, j S. But how fast does this convergence occur?

First choose a metric: Definition The total variation distance between two probability measures µ and µ on S is given by µ µ TV = sup A S ( µ(a) µ (A) ) = 1 2 µ(s) µ (s) s S

First choose a metric: Definition The total variation distance between two probability measures µ and µ on S is given by µ µ TV = sup A S ( µ(a) µ (A) ) = 1 2 µ(s) µ (s) s S Suppose we start the chain X at position X 0 = x. Write d x (t) = δ x P t π TV This measures the distance between the distribution of X t and π.

Now introduce a parameter to measure time until d x (t) is small. Definition The mixing time of X, when X 0 = x, is and (by convention) we define Easy to show that τ x (ε) = min {t : d x (t) ε} τ x = τ x (1/4). τ x (ε) log 2 ε 1 τ x.

Random walks on groups Suppose now that our state space is a finite group G, and that X has transitions governed by P (X t+1 = hg X t = g) = Q(h) for all g, h G, for some probability distribution Q on G. Then X is a random walk on a group, and (as long as Q isn t concentrated on a subgroup) its stationary distribution is uniform: π(g) = 1/ G for all g G.

Random walks on groups Suppose now that our state space is a finite group G, and that X has transitions governed by P (X t+1 = hg X t = g) = Q(h) for all g, h G, for some probability distribution Q on G. Then X is a random walk on a group, and (as long as Q isn t concentrated on a subgroup) its stationary distribution is uniform: π(g) = 1/ G for all g G. Distribution after repeated steps given by convolution: P (X 2 = g X 0 = id) = Q Q(g) = h Q(gh 1 )Q(h)

We re often interested in a natural sequence of processes X (n) on groups G (n) of increasing size: how does the mixing time τ x (n) scale with n?

We re often interested in a natural sequence of processes X (n) on groups G (n) of increasing size: how does the mixing time τ x (n) scale with n? In lots of nice examples a cutoff phenomenon is exhibited: for all ε > 0, lim n τ x (n) (ε) τ x (n) (1 ε) = 1 1.0 0.8 0.6 0.4 0.2 0.0 0

Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring?

Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring? For example: random walk on Z n (n odd) with x { x + 1 2x w.p. 1 p n w.p. p n

Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring? For example: random walk on Z n (n odd) with p { x + 1 w.p. 1 p n x 1 p 2x w.p. p n 0 1 p 5 6 p p p p p 1 p 1 p 2 1 p 1 p 4 1 p 3 1 p

Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3

Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3 When a = 1 there exist constants C and C such that: e Ct/n2 < P (n) t π (n) TV < e C t/n 2

Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3 When a = 1 there exist constants C and C such that: e Ct/n2 < P (n) t π (n) TV < e C t/n 2 When a = 2 and n = 2 m 1 there exist constants c and c such that: for t n c log n log log n, P (n) t n π (n) TV 0 as n for t n c log n log log n, P (n) t n π (n) TV > ε as n

Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ).

Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ). But for this relatively simple walk, we can get around this by looking at the process subsampled at jump times. (Here we call a +1 move a step and a 2 move a jump.)

Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ). But for this relatively simple walk, we can get around this by looking at the process subsampled at jump times. (Here we call a +1 move a step and a 2 move a jump.) So consider (with X 0 = Y 0 = 0) Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) (i.e. Y k is the position of X immediately following the k th jump.)

Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X.

Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X. Restrict attention to p n = 1, α (0, 1). 2nα We expect things to happen (for Y ) sometime around T n := log 2 (np n ) (1 α) log 2 n

Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X. Restrict attention to p n = 1, α (0, 1). 2nα We expect things to happen (for Y ) sometime around T n := log 2 (np n ) (1 α) log 2 n Theorem Y exhibits a cutoff at time T n, with window size O(1).

Lower bound For a lower bound, we simply find a set A n (θ) (of considerable size) that Y has very little chance of hitting before time T n θ, for some θ N. (Recall the definition of total variation distance!) Y 0 A n (θ) E [Y Tn θ] (3/8 γ)n

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ.

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2.

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2. Now choose γ and θ to make the difference between these large.

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2. Now choose γ and θ to make the difference between these large. We get a lower bound: Y has not mixed at time T n 4

Furthermore, we can choose γ depending on θ to maximise the discrepancy. Lower bound on total variation distance tends to one quickly as θ increases: 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20

Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ.

Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ. Then a basic but extremely useful result is the following: Lemma (Diaconis and Shahshahani, 1981) P π 2 TV 1 d ρ Tr (ˆP(ρ)ˆP(ρ) ) 4 non triv irr ρ

Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ. Then a basic but extremely useful result is the following: Lemma (Diaconis and Shahshahani, 1981) P π 2 TV 1 d ρ Tr (ˆP(ρ)ˆP(ρ) ) 4 non triv irr ρ Since P P(ρ) = ˆP(ρ)ˆP(ρ), this lemma gives a bound on total variation distance after any number of steps.

Writing Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) as before, we see that the distribution of Y k is just the convolution of measures µ j given by µ j (2 k+1 j s) = (1 p n) s p n 1 (1 p n ) n, 1 j k.

Writing Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) as before, we see that the distribution of Y k is just the convolution of measures µ j given by µ j (2 k+1 j s) = (1 p n) s p n 1 (1 p n ) n, 1 j k. Irreducible representations of Z n are 1-dimensional, and we can write (for s = 0,..., n 1) ˆµ j (s) = p n 1 z js (1 p n ), where z js = e i 2π n 2k+1 j s.

Putting this into the Lemma we obtain an upper bound: δ 0 P t π 2 TV 1 4 n 1 t s=1 k=1 p 2 n 1 2(1 p n ) cos( 2π n 2k s) + (1 p n ) 2

Putting this into the Lemma we obtain an upper bound: δ 0 P t π 2 TV 1 4 n 1 t s=1 k=1 p 2 n 1 2(1 p n ) cos( 2π n 2k s) + (1 p n ) 2 Theorem For any fixed θ N, lim sup δ 0 P (n) T n+θ π(n) TV h(θ) n where h tends to zero exponentially fast as θ. This completes our proof of a cutoff.

Moving from Y to X We ve seen that Y mixes in an interval of length O(1) around T n = log 2 (np n ): what does this tell us about the mixing time for X?

Moving from Y to X We ve seen that Y mixes in an interval of length O(1) around T n = log 2 (np n ): what does this tell us about the mixing time for X? Corollary For p n = 1/2n α, with α (0, 1), X exhibits a cutoff at time T X n = T n /p n with window size of O(ξ n T X n /p n ), where ξ n is any sequence tending to infinity.

And finally: open problems 1 We can deal with more interesting steps in our walk, but not yet with more interesting jumps, e.g. consider x + 1 w.p. 1 p n x 2x w.p. p n /2 ) x w.p. pn /2 ( n+1 2 or more general rules, such as x x 2...

And finally: open problems 1 We can deal with more interesting steps in our walk, but not yet with more interesting jumps, e.g. consider x + 1 w.p. 1 p n x 2x w.p. p n /2 ) x w.p. pn /2 ( n+1 2 or more general rules, such as x x 2... 2 Or how about this process? { x + 1 x ax w.p. 1 p n w.p. p n where multiplication by a is not invertible? (Stationary distribution won t be uniform... )