Mixing time for a random walk on a ring

Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013

Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix P = (p ij ), where p ij = P (X t+1 = j X t = i). The vector π is a stationary distribution for X if π = πp. If X is ergodic then we know that P (X t = j X 0 = i) π j as t, for all i, j S. But how fast does this convergence occur?

First choose a metric: Definition The total variation distance between two probability measures µ and µ on S is given by µ µ TV = sup A S ( µ(a) µ (A) ) = 1 2 µ(s) µ (s) s S

First choose a metric: Definition The total variation distance between two probability measures µ and µ on S is given by µ µ TV = sup A S ( µ(a) µ (A) ) = 1 2 µ(s) µ (s) s S Suppose we start the chain X at position X 0 = x. Write d x (t) = δ x P t π TV This measures the distance between the distribution of X t and π.

Now introduce a parameter to measure time until d x (t) is small. Definition The mixing time of X, when X 0 = x, is and (by convention) we define Easy to show that τ x (ε) = min {t : d x (t) ε} τ x = τ x (1/4). τ x (ε) log 2 ε 1 τ x.

Random walks on groups Suppose now that our state space is a finite group G, and that X has transitions governed by P (X t+1 = hg X t = g) = Q(h) for all g, h G, for some probability distribution Q on G. Then X is a random walk on a group, and (as long as Q isn t concentrated on a subgroup) its stationary distribution is uniform: π(g) = 1/ G for all g G.

We re often interested in a natural sequence of processes X (n) on groups G (n) of increasing size: how does the mixing time τ x (n) scale with n?

We re often interested in a natural sequence of processes X (n) on groups G (n) of increasing size: how does the mixing time τ x (n) scale with n? In lots of nice examples a cutoff phenomenon is exhibited: for all ε > 0, lim n τ x (n) (ε) τ x (n) (1 ε) = 1 1.0 0.8 0.6 0.4 0.2 0.0 0

Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring?

Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring? For example: random walk on Z n (n odd) with x { x + 1 2x w.p. 1 p n w.p. p n

Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring? For example: random walk on Z n (n odd) with p { x + 1 w.p. 1 p n x 1 p 2x w.p. p n 0 1 p 5 6 p p p p p 1 p 1 p 2 1 p 1 p 4 1 p 3 1 p

Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3

Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3 When a = 1 there exist constants C and C such that: e Ct/n2 < P (n) t π (n) TV < e C t/n 2 When a = 2 and n = 2 m 1 there exist constants c and c such that: for t n c log n log log n, P (n) t n π (n) TV 0 as n for t n c log n log log n, P (n) t n π (n) TV > ε as n

Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ).

Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ). But for this relatively simple walk, we can get around this by looking at the process subsampled at jump times. (Here we call a +1 move a step and a 2 move a jump.) So consider (with X 0 = Y 0 = 0) Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) (i.e. Y k is the position of X immediately following the k th jump.)

Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X.

Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X. Restrict attention to p n = 1, α (0, 1). 2nα We expect things to happen (for Y ) sometime around T n := log 2 (np n ) (1 α) log 2 n

Lower bound For a lower bound, we simply find a set A n (θ) (of considerable size) that Y has very little chance of hitting before time T n θ, for some θ N. (Recall the definition of total variation distance!) Y 0 A n (θ) E [Y Tn θ] (3/8 γ)n

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ.

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2.

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2. Now choose γ and θ to make the difference between these large.

If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2. Now choose γ and θ to make the difference between these large. We get a lower bound: Y has not mixed at time T n 4

Furthermore, we can choose γ depending on θ to maximise the discrepancy. Lower bound on total variation distance tends to one quickly as θ increases: 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20

Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ.

Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ. Then a basic but extremely useful result is the following: Lemma (Diaconis and Shahshahani, 1981) P π 2 TV 1 d ρ Tr (ˆP(ρ)ˆP(ρ) ) 4 non triv irr ρ

Writing Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) as before, we see that the distribution of Y k is just the convolution of measures µ j given by µ j (2 k+1 j s) = (1 p n) s p n 1 (1 p n ) n, 1 j k. Irreducible representations of Z n are 1-dimensional, and we can write (for s = 0,..., n 1) ˆµ j (s) = p n 1 z js (1 p n ), where z js = e i 2π n 2k+1 j s.

Putting this into the Lemma we obtain an upper bound: δ 0 P t π 2 TV 1 4 n 1 t s=1 k=1 p 2 n 1 2(1 p n ) cos( 2π n 2k s) + (1 p n ) 2

Putting this into the Lemma we obtain an upper bound: δ 0 P t π 2 TV 1 4 n 1 t s=1 k=1 p 2 n 1 2(1 p n ) cos( 2π n 2k s) + (1 p n ) 2 Theorem For any fixed θ N, lim sup δ 0 P (n) T n+θ π(n) TV h(θ) n where h tends to zero exponentially fast as θ. This completes our proof of a cutoff.

Moving from Y to X We ve seen that Y mixes in an interval of length O(1) around T n = log 2 (np n ): what does this tell us about the mixing time for X?

Moving from Y to X We ve seen that Y mixes in an interval of length O(1) around T n = log 2 (np n ): what does this tell us about the mixing time for X? Corollary For p n = 1/2n α, with α (0, 1), X exhibits a cutoff at time T X n = T n /p n with window size of O(ξ n T X n /p n ), where ξ n is any sequence tending to infinity.

And finally: open problems 1 We can deal with more interesting steps in our walk, but not yet with more interesting jumps, e.g. consider x + 1 w.p. 1 p n x 2x w.p. p n /2 ) x w.p. pn /2 ( n+1 2 or more general rules, such as x x 2... 2 Or how about this process? { x + 1 x ax w.p. 1 p n w.p. p n where multiplication by a is not invertible? (Stationary distribution won t be uniform... )