Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Paris, September 2013
Introduction Let X be a discrete time Markov chain on a finite state space S, with transition matrix P = (p ij ), where p ij = P (X t+1 = j X t = i). The vector π is a stationary distribution for X if π = πp. If X is ergodic then we know that P (X t = j X 0 = i) π j as t, for all i, j S. But how fast does this convergence occur?
First choose a metric: Definition The total variation distance between two probability measures µ and µ on S is given by µ µ TV = sup A S ( µ(a) µ (A) ) = 1 2 µ(s) µ (s) s S
First choose a metric: Definition The total variation distance between two probability measures µ and µ on S is given by µ µ TV = sup A S ( µ(a) µ (A) ) = 1 2 µ(s) µ (s) s S Suppose we start the chain X at position X 0 = x. Write d x (t) = δ x P t π TV This measures the distance between the distribution of X t and π.
Now introduce a parameter to measure time until d x (t) is small. Definition The mixing time of X, when X 0 = x, is and (by convention) we define Easy to show that τ x (ε) = min {t : d x (t) ε} τ x = τ x (1/4). τ x (ε) log 2 ε 1 τ x.
Random walks on groups Suppose now that our state space is a finite group G, and that X has transitions governed by P (X t+1 = hg X t = g) = Q(h) for all g, h G, for some probability distribution Q on G. Then X is a random walk on a group, and (as long as Q isn t concentrated on a subgroup) its stationary distribution is uniform: π(g) = 1/ G for all g G.
Random walks on groups Suppose now that our state space is a finite group G, and that X has transitions governed by P (X t+1 = hg X t = g) = Q(h) for all g, h G, for some probability distribution Q on G. Then X is a random walk on a group, and (as long as Q isn t concentrated on a subgroup) its stationary distribution is uniform: π(g) = 1/ G for all g G. Distribution after repeated steps given by convolution: P (X 2 = g X 0 = id) = Q Q(g) = h Q(gh 1 )Q(h)
We re often interested in a natural sequence of processes X (n) on groups G (n) of increasing size: how does the mixing time τ x (n) scale with n?
We re often interested in a natural sequence of processes X (n) on groups G (n) of increasing size: how does the mixing time τ x (n) scale with n? In lots of nice examples a cutoff phenomenon is exhibited: for all ε > 0, lim n τ x (n) (ε) τ x (n) (1 ε) = 1 1.0 0.8 0.6 0.4 0.2 0.0 0
Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring?
Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring? For example: random walk on Z n (n odd) with x { x + 1 2x w.p. 1 p n w.p. p n
Random walk on a ring What if we add some additional structure, and try to move from a walk on a group to a walk on a ring? For example: random walk on Z n (n odd) with p { x + 1 w.p. 1 p n x 1 p 2x w.p. p n 0 1 p 5 6 p p p p p 1 p 1 p 2 1 p 1 p 4 1 p 3 1 p
Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3
Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3 When a = 1 there exist constants C and C such that: e Ct/n2 < P (n) t π (n) TV < e C t/n 2
Related results Chung, Diaconis and Graham (1987) study the process (used in random number generation) ax 1 w.p. 1 3 x ax w.p. 1 3 ax + 1 w.p. 1 3 When a = 1 there exist constants C and C such that: e Ct/n2 < P (n) t π (n) TV < e C t/n 2 When a = 2 and n = 2 m 1 there exist constants c and c such that: for t n c log n log log n, P (n) t n π (n) TV 0 as n for t n c log n log log n, P (n) t n π (n) TV > ε as n
Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ).
Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ). But for this relatively simple walk, we can get around this by looking at the process subsampled at jump times. (Here we call a +1 move a step and a 2 move a jump.)
Back to our process... General problem The distribution of X t isn t given by convolution. X t = 2I t X t 1 + (1 I t )(X t 1 + 1) where I t Bern(p n ). But for this relatively simple walk, we can get around this by looking at the process subsampled at jump times. (Here we call a +1 move a step and a 2 move a jump.) So consider (with X 0 = Y 0 = 0) Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) (i.e. Y k is the position of X immediately following the k th jump.)
Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X.
Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X. Restrict attention to p n = 1, α (0, 1). 2nα We expect things to happen (for Y ) sometime around T n := log 2 (np n ) (1 α) log 2 n
Plan 1 Find lower bound for mixing time of Y ; 2 Find upper bound for mixing time of Y ; 3 Try to relate these back to the mixing time for X. Restrict attention to p n = 1, α (0, 1). 2nα We expect things to happen (for Y ) sometime around T n := log 2 (np n ) (1 α) log 2 n Theorem Y exhibits a cutoff at time T n, with window size O(1).
Lower bound For a lower bound, we simply find a set A n (θ) (of considerable size) that Y has very little chance of hitting before time T n θ, for some θ N. (Recall the definition of total variation distance!) Y 0 A n (θ) E [Y Tn θ] (3/8 γ)n
If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ.
If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2.
If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2. Now choose γ and θ to make the difference between these large.
If A n (θ) is chosen as above then π(a n (θ)) = 1/4 + 2γ. Using Chebychev s inequality: P (Y Tn θ A n (θ)) 4 1 θ 3(3/8 γ) 2. Now choose γ and θ to make the difference between these large. We get a lower bound: Y has not mixed at time T n 4
Furthermore, we can choose γ depending on θ to maximise the discrepancy. Lower bound on total variation distance tends to one quickly as θ increases: 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20
Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ.
Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ. Then a basic but extremely useful result is the following: Lemma (Diaconis and Shahshahani, 1981) P π 2 TV 1 d ρ Tr (ˆP(ρ)ˆP(ρ) ) 4 non triv irr ρ
Upper bound Let P be a probability on a group G. For a representation ρ, write ˆP(ρ) = g G P(g)ρ(g) for the Fourier transform of P at ρ. Then a basic but extremely useful result is the following: Lemma (Diaconis and Shahshahani, 1981) P π 2 TV 1 d ρ Tr (ˆP(ρ)ˆP(ρ) ) 4 non triv irr ρ Since P P(ρ) = ˆP(ρ)ˆP(ρ), this lemma gives a bound on total variation distance after any number of steps.
Writing Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) as before, we see that the distribution of Y k is just the convolution of measures µ j given by µ j (2 k+1 j s) = (1 p n) s p n 1 (1 p n ) n, 1 j k.
Writing Y k = k j=1 2 k+1 j B j (mod n), B j i.i.d. Geom(p n ) (mod n) as before, we see that the distribution of Y k is just the convolution of measures µ j given by µ j (2 k+1 j s) = (1 p n) s p n 1 (1 p n ) n, 1 j k. Irreducible representations of Z n are 1-dimensional, and we can write (for s = 0,..., n 1) ˆµ j (s) = p n 1 z js (1 p n ), where z js = e i 2π n 2k+1 j s.
Putting this into the Lemma we obtain an upper bound: δ 0 P t π 2 TV 1 4 n 1 t s=1 k=1 p 2 n 1 2(1 p n ) cos( 2π n 2k s) + (1 p n ) 2
Putting this into the Lemma we obtain an upper bound: δ 0 P t π 2 TV 1 4 n 1 t s=1 k=1 p 2 n 1 2(1 p n ) cos( 2π n 2k s) + (1 p n ) 2 Theorem For any fixed θ N, lim sup δ 0 P (n) T n+θ π(n) TV h(θ) n where h tends to zero exponentially fast as θ. This completes our proof of a cutoff.
Moving from Y to X We ve seen that Y mixes in an interval of length O(1) around T n = log 2 (np n ): what does this tell us about the mixing time for X?
Moving from Y to X We ve seen that Y mixes in an interval of length O(1) around T n = log 2 (np n ): what does this tell us about the mixing time for X? Corollary For p n = 1/2n α, with α (0, 1), X exhibits a cutoff at time T X n = T n /p n with window size of O(ξ n T X n /p n ), where ξ n is any sequence tending to infinity.
And finally: open problems 1 We can deal with more interesting steps in our walk, but not yet with more interesting jumps, e.g. consider x + 1 w.p. 1 p n x 2x w.p. p n /2 ) x w.p. pn /2 ( n+1 2 or more general rules, such as x x 2...
And finally: open problems 1 We can deal with more interesting steps in our walk, but not yet with more interesting jumps, e.g. consider x + 1 w.p. 1 p n x 2x w.p. p n /2 ) x w.p. pn /2 ( n+1 2 or more general rules, such as x x 2... 2 Or how about this process? { x + 1 x ax w.p. 1 p n w.p. p n where multiplication by a is not invertible? (Stationary distribution won t be uniform... )