Introduction to Markov Chains and Riffle Shuffling

Similar documents
INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

MARKOV CHAINS AND HIDDEN MARKOV MODELS

MARKOV CHAINS AND MIXING TIMES

Some Definition and Example of Markov Chain

The coupling method - Simons Counting Complexity Bootcamp, 2016

The Markov Chain Monte Carlo Method

Introduction to Stochastic Processes

Lecture 2: September 8

CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

Lecture 3: September 10

RANDOM WALKS AND THE PROBABILITY OF RETURNING HOME

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Assignment 4: Solutions

Coupling AMS Short Course

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

The cutoff phenomenon for random walk on random directed graphs

Lecture 28: April 26

Lectures on Markov Chains

Coupling. 2/3/2010 and 2/5/2010

Markov Processes Hamid R. Rabiee

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Sample Spaces, Random Variables

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Probability. Lecture Notes. Adolfo J. Rumbos

JUSTIN HARTMANN. F n Σ.

Characterization of cutoff for reversible Markov chains

RANDOM WALKS. Course: Spring 2016 Lecture notes updated: May 2, Contents

PROBABILITY VITTORIA SILVESTRI

ABSTRACT MARKOV CHAINS, RANDOM WALKS, AND CARD SHUFFLING. Nolan Outlaw. May 2015

Convergence Rate of Markov Chains

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

PROBABILITY. Contents Preface 1 1. Introduction 2 2. Combinatorial analysis 5 3. Stirling s formula 8. Preface

Introduction to Stochastic Processes

Exercises with solutions (Set D)

MARKOV CHAINS AND COUPLING FROM THE PAST

1 Sequences of events and their limits

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Basic Probability. Introduction

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Lecture 5: Random Walks and Markov Chain

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 1: Introduction to Probability Theory

X = X X n, + X 2

Characterization of cutoff for reversible Markov chains

Mixing Times and Hitting Times

TOTAL VARIATION CUTOFF IN BIRTH-AND-DEATH CHAINS

Markov Chains on Countable State Space

RECURRENCE IN COUNTABLE STATE MARKOV CHAINS

Flip dynamics on canonical cut and project tilings

Discrete Probability Refresher

Notes 1 : Measure-theoretic foundations I

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

STA 711: Probability & Measure Theory Robert L. Wolpert

RANDOM WALKS IN Z d AND THE DIRICHLET PROBLEM

Chapter 2: Random Variables

Module 1. Probability

Homework set 3 - Solutions

STAT 7032 Probability Spring Wlodek Bryc

COS597D: Information Theory in Computer Science September 21, Lecture 2

Math Bootcamp 2012 Miscellaneous

CS 361: Probability & Statistics

Math 180B Homework 4 Solutions

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Markov Chains and Mixing Times

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Stochastic Realization of Binary Exchangeable Processes

Lecture 20: Reversible Processes and Queues

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

P (E) = P (A 1 )P (A 2 )... P (A n ).

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

Modelling data networks stochastic processes and Markov chains

STOCHASTIC PROCESSES Basic notions

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Lecture 6: Entropy Rate

Lecture 4 An Introduction to Stochastic Processes

Chapter 2 Random Variables

Lecture 20 : Markov Chains

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

STA205 Probability: Week 8 R. Wolpert

Introducing the Normal Distribution

Mixing time for a random walk on a ring

1 Stat 605. Homework I. Due Feb. 1, 2011

Lecture 10. Variance and standard deviation

25.1 Markov Chain Monte Carlo (MCMC)

CONVERGENCE OF RANDOM SERIES AND MARTINGALES

MATH MW Elementary Probability Course Notes Part I: Models and Counting

A birthday in St. Petersburg

Markov Chains and Mixing Times. David A. Levin Yuval Peres Elizabeth L. Wilmer

Ergodic Properties of Markov Processes

MAS275 Probability Modelling Exercises

Modern Discrete Probability Spectral Techniques

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Inference for Stochastic Processes

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Transcription:

Introduction to Markov Chains and Riffle Shuffling Nina Kuklisova Math REU 202 University of Chicago September 27, 202 Abstract In this paper, we introduce Markov Chains and their basic properties, and we look at a simple application in shuffling cards. We derive the rate of convergence to a stationary distribution for the most common shuffling method, dovetail shuffle. Contents Introduction 2 Markov Chains 2 2. Basic Definitions and concepts.......................... 2 2.2 Definition of Markov Chains........................... 4 2.3 Coupling...................................... 5 2.4 Stopping and Stationary Times......................... 7 2.5 Time Reversal................................... 9 3 Riffle Shuffles 9 3. Gilbert-Shannon-Reeds model.......................... 9 3.2 Approach to Uniformity in the GSR Shuffling Model............. Introduction A Markov chains is a type of stochastic process that was first studied by Markov in 906. A process consists of a sequence of states; in a Markov chain, each state is independent of the previous one. Markov chains are of great interest, because they can model many different problems. They were studied rigorously by P. Diaconis []; further properties of Markov Chains can be found in [2]. The motivating example of this paper is card shuffling. This paper is mostly expository. In the first part, we introduce Markov Chains, as they

are described in [2]; then, we reprove one of the first fundamental theorems on this, which was first done in [3]. This paper was written without any previous knowledge of statistics and it doesn t assume the reader to familiar with this field. For concepts that are more complex and need a more detailed explanation, we give reference to the literature that does so. 2 Markov Chains 2. Basic Definitions and concepts Most of the material in this section is explained in further detail in [2]. The majority of the structures that we talk about are probability distributions: these represent the range of possible outcomes and their respective probabilities. The total probability of all outcomes must be one. Definition 2.. A probability distribution on a countable set Ω is a function P : Ω [0, ] such that P (A) =. A Ω A random variable X allows a measurable function to be defined on Ω. The probability measure P X of the random variable X on R is called the distribution. For Borel set B it is defined by P X (B) := P(X B). Two events A, A 2 are independent if P (A A 2 ) = P (A ) P (A 2 ). Definition 2.2. A probability (state) space is a triple (Ω, F, P ), where Ω is the set of outcomes, F the set of events, and P is a probability measure on Ω so that P : F [0, ] assigns probabilities to events. We will denote this probability space by P. For example, for a coin toss, Ω = {Heads (H), Tails (T )} ; if we are considering two tosses, F = {(HH), (T T ), (HT ), (T H)} and P (H, T ) = P (T, H) = P (T, T ) = P (H, H) = /4. We can imagine these processes as applying the transition matrix P many times. At time t, we have P t, which means that t transitions occured. P t (x, y) denotes the probability of getting from x to y in t steps. There are two different types of variables that make up probability distributions: discrete and absolutely continuous. Definition 2.3. A discrete variable X only has a finite number of possible outcomes. 2

Its simplest example is the coin toss. For this type of variable, there is a finite set, called the support of X, that contains all the possible values of X. If we denote these values x, x 2,..., x n, then n P (x i ) =. i= Definition 2.4. An absolutely continuous variable can take on any value within a certain range - just like a continuous function defined on some interval. For this type of absolutely continuous variable, the probability that a random variable has a value in an interval A is equal to the area of the function achieved on this interval, in relation to the area under the whole curve. This function is called the density function p(x) on R such that P X (A) = p(x)dx for any A Ω A is the measure of probability that the variable has a value in interval A. Thus, for any domain, we can define some kind of a probability space. function shows how close to each other do the events occur. The density Definition 2.5. For a random variable X and a function P : Ω R, we can define expectation: for a discrete random variable X with support Ω, the expectation E(x) is defined by E(x) = x R xp{x = x}; () for an absolutely continuous random variable X with density p(x), E(x) = xp(x)dx. (2) R Often, the initial distribution is concentrated at a single definite starting state x. This is denoted P x and E x. For more complex models, it is useful to define probability distributions. In most processes, the probability of each outcome is different; we can only say that the sum of all these probabilities over a distribution is. Most of the distributions described in this paper have independent and identically distributed random variables, which we will simply denote as i.i.d.. When we are interested in a specific range of outcomes, we study marginal distributions. These can be defined on both discrete and absolutely continuous distributions F. Definition 2.6. If F is the distribution of variables (X, X 2,..., X d ), the marginal distribution of X i is F i (x) = P (X i x). So, the marginal distribution for a variable can be visualized as a cut of the original distribution through the variable X i. 3

In this paper, we mostly use only two variables. Their marginal distributions are denoted by Greek letters µ(x) and ν(x). We often are interested in situations when some specific conditions are satisfied, so we look at joint distributions: For x and y being defined on a probability space Ω Ω, we can define a joint distribution P (x, y) = P (X = x and Y = y). Then, for independent random variables, P (x, y) = P (x)p (y). 2.2 Definition of Markov Chains What we call a chain here is a sequence of random variables where each somehow depends on the previous one. It is a process that changes in time increments δt, and we denote a state at such t by X t. In case of a coin toss, at each step, we have the same probability for all outcomes, regardless of the previous tosses. This makes the coin toss a good example of a stochastic process with the Markov property. So, we can establish the main properties of the distributions studied in this paper. Definition 2.7. Markov property of a sequence of random variables (X 0, X,..., X t, X t+ ) defined on Ω with P : Ω R means that the value of the last element is only dependent on the before-last element: P{X t+ = y X t = x,...x 0 = z} = P{X t+ = y X t = x}. For a process with discrete variables, each such state can be represented by a vector. We can imagine these processes as applying a transition matrix P many times. At time t, t transitions occured, so we can denote the transition by P t. P t (x, y) denotes the probability of getting from x to y in t steps. We will denote events by H t = t i=0 {X i = x i }, where x i can be any outcomes. Definition 2.8. A sequence of random variables (X 0, X,..., X t, X t+ ) is a Markov Chain with state space Ω and transition matrix P if for all x, y Ω, at any t, and all events H t = t s=0 {X s = x s } satisfying we have P(H t {X t = x}) > 0, P{X t+ = y H t {X t = x}} = P {X t+ = y X t = x} = P (x, y). We have just defined the Markov Chain as a sequence of events for which an outcome x has a nonzero probability at the last step, the probability that the outcome of this sequence of events at the next step will be y is the same as the probability that if the before-last step is x, the last step will be y. As we have seen, one of the simplest Markov Chains that we could think of is the coin toss. If we denote this set of possible outcomes at each step by λ, we can define a random mapping representation. 4

Definition 2.9. A random mapping representation of a transition matrix P on state space Ω is a function f : Ω Λ Ω, where Z is a Λ-valued random variable, satisfying P {f(x, Z) = y} = P (x, y). We needed such a precise definition, because it can be widely used. In this paper, we will see its use for studying card shuffling. Theorem 2.0. Every transition matrix on a finite state space has a random mapping representation. Proof. Take a Markov chain with state space Ω = {x, x 2,..., x n } with a transition matrix P. Choose the auxiliary random variables from the interval Λ = [0, ]. Define one auxiliary probability function showing how likely it is that after x j, we get at most x k : F j,k = k P (x j, x i ), i= and another auxiliary function for which f(x j, z) := x k when F j,k < z F j,k ; then P{f(x j, Z) = x k } = P{F j,k < Z F j,k } = P (x j, x k ). Since we could have used any x j and x k, we have a mapping representation for any transition matrix. 2.3 Coupling Here, we proceed to one of the key notions in this paper, which is called coupling, and leads to stationary distributions. Recall that we have defined marginal distributions. For a variable X, we can have P (X x) = µ(x); for another variable Y, P (Y y) = ν(y). Definition 2.. A coupling of µ and ν is a pair of random variables (X, Y ) defined on a single probability space such that the marginal distribution of X is µ and that of Y is ν. A coupling (X, Y ) satisfies P {X = x} = µ(x) and P {Y = y} = ν(y). Using the familiar example of coin tosses, if we use a fair coin, we have P {X = x and Y = y} = /4 for all possible pairs of (x, y) from {0, }. Definition 2.2. For a probability transition matrix P, a distribution π on Ω satisfying π = πp is a stationary distribution of the Markov chain. 5

As an example, we can take a look at simple random walks on a graph G with V vertices and E edges (written as G = (V, E)). We denote the number of neighbors of a vertex x by deg(x). Two neighboring vertices are denoted by x y. The probability that a person standing on any vertex y V will go to its neighboring edge x is. For any vertex deg(y) y V, with any of its neighboring vertices x, we get deg(x)p (x, y) = x y x V deg(x) deg(x) = deg(y). With the total number of vertices E, we can define the probability measure of coming to the vertex y at the next step as π(y) = deg(y). Therefore, for any y Ω, the probability 2 E measure π(y) is always a stationary distribution for the walk. Definition 2.3. For x Ω, the hitting time for x is τ x = min{t 0 when X t = x}. This is the first time at which the chain visits state x. The first return time is τ + x = min{t : X t = x}; here, we are only considering positive time. The notion of hitting time permits us to establish further properties of Markov chains with the property of irreducibility. Definition 2.4. A chain P is called irreducible if for any two states x, y Ω, there exists a t Z such that P t (x, y) > 0. This means that it is possible to get from any state to any other state using transitions of positive probability. Theorem 2.5. Let P be the transition matrix of an irreducible Markov chain. Then (a) there exists a probability distribution π on Ω such that π = πp and π(x) > 0 for all x Ω, (b) π(x) = E x (τ + x ). Proof. (a) We will study properties of the state x through looking at states y and z as well. For an arbitrary state of the Markov chain at any time, let the number of visits to y before returning to z be E z, which is the sum of all probabilities that at any time t when X t = y, the first return time to z is higher than this time; then π(y) := E z = P z {X t = y, τ z + > t}, (3) π is a stationary distribution if πp = π. For any of these arbitrary y, πp = P z {X t = y, τ z + > t}p (x, y) = = t=0 t=0 P z {X t = x, X t+ = y, τ z + t + } t=0 P z {X t = y, τ z + > t} P z {X 0 = y, τ z + > 0} + t=0 6 P z {X t = y, τ z + t= = t}.

Now, two cases can occur: if y = z; then P z {X 0 = z, τ + z > 0} = and P z {X t = z, τ z + = t} = ; t= if y z; then P z {X 0 = y, τ + z > 0} = 0, and P z {X t = z, τ z + = t} = 0. t= Thus, the last two terms in 2.3 cancel, and πp = π. (b) When we normalize the distribution by E x (τ z + ) = x π(x); we get, for any x Ω, Π(x) = π(x) E z (τ + z ) = E x (τ + x ). 2.4 Stopping and Stationary Times The following definitions seem trivial, but they become helpful in our study of markovian processes. Definition 2.6. For a sequence (X t ) t=0 of Ω - valued random variables; a {0,,..., } -valued random variable τ is a stopping time for (X t ) if for each t {0,,...}, there is a set B t Ω t+ such that {τ = t} = {(X 0, X,..., X t ) B t }. We can also say that at a stopping time τ, the event {τ = t} is determined by X 0,.., X t. A we can define the stopping time for a market stock. A trader can sell it after it exceeds a certain value; the time when this happened is the stopping tiime. Recall the random mapping representation (from definition 2.9): we can apply the map f at an i.i.d. sequence (Z t ) t=; then, the sequence (X t ) t= defined by is a Markov chain with transition matrix P. X 0 = x, X t = f(x t, Z t ) (4) Definition 2.7. A random time τ is called a randomized stopping time for the Markov chain (X t ) if it is a stopping time for a sequence (Z t ). Let s take a look at an example: the lazy random walk on the hypercube {0, } n. At each step of this process, an element (k, B) is selected, uniformly at random, from {, 2,..., n} {0, } and the coordinate k is updated with the bit B. The chain is determined by the i.i.d. sequence (Z t ), where with (Z t ) = (K t, B t ) being the coordinate and bit 7

pair used to update at step t. Let s define τ ref := min {t 0 : {j,..., j 2 } = {, 2,..., n}}, the first time when each coordinate has been updated at least once. At this time, all of the coordinates have been replaced with independent fair bits, so on {0, } n, the chain s distribution is uniform. So X τref is an exact sample from the stationary distribution π. Since τ ref is not a function of (X t ), but of (Z t ); it is a stopping time for (Z t ), so it is a randomized stopping time. Definition 2.8. For (X t ), an irreducible Markov chain with stationary distribution Π, a stationary time τ for (X t ) is a randomized stopping time, possibly depending on the starting position x, s.t. the distribution of X τ is Π : P x {X t = y} = Π(y). Definition 2.9. A strong stationary time for a Markov chain (X t ) with stationary distribution Π is a randomized stopping time τ, possibly depending on the starting position x, such that P x {τ = t, X τ = y} = P x {τ = t}π(y). This means that with a strong stationary time τ, X τ has distribution Π and is independent of τ. As an example, consider again the lazy random walk on the hypercube; τ ref is also a strong stationary time. Lemma 2.20. Let (X t ) be an irreducible Markov chain with a stationary distribution Π. If τ is a strong stationary time for (X t ), then for all t 0, P x {τ < t, X t = y} = P x {τ t}π(y). (5) Proof. Denote the (X t ) by an i.i.d. sequence Z, Z 2,... Then, at any s t, P x {τ = s, X t = y} = z Ω P x {X t = y τ = s, X s = z}p x {τ = s, X s = z}. (6) Now, by the definition of a strong stationary time, for there is a set B that is a subset of Ω s, for which {τ = s} = {Z,..., Z s } B. Therefore, we can define a function f r, for which X s+r = f r (X s, Z s+,..., Z s+r ). The vectors (Z,..., Z s ) and (Z s+,..., Z t ) are independent, which means that P x {X t = y τ = s, X s = z} = P x { f t s (z, Z s+,..., Z t ) = y (X,..., X s ) B, X s = z} When we put equations (6) and (7) together, we can see that = P t s (z, y). (7) P x {τ = s, X t = y} = z Ω P t s (z, y)π(z)p x {τ = s} = π(y)p x {τ = s}. 8

This further implies that P x {τ < t, X t = y} = P x {τ = s, X t = y} s t = π(y)p x {τ = s} s t = P x {τ t}π(y). 2.5 Time Reversal The analysis of many processes would be more simple from the end. For this reason, it is good to see what properties does a markovian process share with its inverse process. Definition 2.2. The time reversal of an irreducible Markov chain with transition matrix P and stationary distribution π is the chain with matrix P (x, y) := π(y)p (y,x) π(x). Definition 2.22. For a distribution µ on a group G, the inverse distribution µ is defined as µ := µ(g ) for all functions g G. Proposition 2.23. Let (X t ) be an irreducible Markov chain with transition matrix P and stationary distribution π. For the time-reversed chain with transition matrix P, write ( X t ). Then π is stationary for P, and for any x 0, x,..., x t Ω we have P π {X 0 = x 0,..., X t = x t } = P{ X 0 = x t,..., X t = x 0 }. Proof. We look for the stationary distribution for P. P π { X 0 = x n,..., X n = x 0 } = π(x n ) P (x n, x n )... P (x 2, x ) P (x, x 0 ) = π(x 0 )P (x 0, x )P (x, x 2 )...P (x n, x n ) = P π {X 0 = x 0,..., X n = x n }. So, we know another useful property of Markov Chains: a distribution that is stationary for a Markov chain is also stationary for its inverse. 3 Riffle Shuffles 3. Gilbert-Shannon-Reeds model The Gilbert-Shannon-Reeds model is the first mathematically precise model of shuffling. It describes the most common strategy of card shuffling: a deck is cut into two heaps; then, 9

a card is dropped from left or right heap, with a probability proportional to the number of cards in the heap, until there are no more cards in one of them. We will denote it by GSR. Since GSR model was the first mathematically precise model of shuffling, its convergence is an important result. It was first derived in a famous paper by Bayer and Diaconis [3], and the following section will find this result as well. Most authors currently studying this phenomenon present multiple processes for which this model can be used [2], but our analysis focuses on the fundamental convergence property of this model. Knowing these properties allows performing many card tricks, and in addition to statical analysis, a book on its application this analysis, the original author wrote a book on its use with magic card tricks [9] Definition 3.. For a deck of n cards, denote the a-shuffle in the following way: Take a stack of cards. Cut it into a packets. Then, drop the cards from these packets succeedingly onto one big pile, in the following way: let b i be the number of cards in the packet i at any b moment; then the chance that the next card dropped will be from this packet is i a. b j j= Theorem 3.2. For an a-shuffle, the probability that it will result in a specific permutation π with a-shuffles is (a+n r, where r is the number of rising sequences in π. a n n ) Proof. We can look at this process from the end. At that point, we have r rising sequences. We can choose how to reorder them into the a packets: we must make r cuts that ensure divising into the r rising sequences that we want; after that we can place a r cuts wherever we like. After any such cut, when we are recreating the original sequence of n cards, we have ( a+n r n ) possibilities. The number of possible intial a-shuffles is a n, since each of these n cards can be in one of a packets. So, this probability is (a+n r n ). a n Corollary 3.3. If a deck of cards is given a sequence of m shuffles of types a, a 2,..., a m ; then the chance that the deck is in arrangement π is (n+a r, for a = a a n, a 2,...a k and r the number of rising sequences in π. Proof. Knowing Theorem 3.2, we can see that if we have an R(π), there is a uniform conditional law of π. Now, we use Lemma of the famous paper by Rogers and Pitman [7]: Once we know that the family of distributions for the process of these rising sequences is complete, then the requirements of this Lemma are satisfied and R(π) is a Markov Chain. So, we want to show that if m ( ) a m + n r f(r) = n i= then f = g. i= n ) n ( ) a m + n r g(r) for m = 0,, 2,..., n 0

Taking x = a m, m ( ) a m + n r f(r) n = [(x + n )(x + n 2)...xf() + n! i= +(x + n 2)(x + n 3)...(x )f(2) +... + +x(x )...(x (n ))f(n)] = f(i) at x = i. Since the same decomposition holds for the right hand side of the equation, we can see that f(i) = g(i). 3.2 Approach to Uniformity in the GSR Shuffling Model In this subsection, we ll get to prove how does this distribution converge to a uniform one. m +n r Proposition 3.4. Let Q m (r) = (2 n ) be the probability of a permutation with r rising 2 mn sequences after m shuffles from the GSR distribution. Let r = n/2 + h, n/2 + h n/2, and m = log 2 (n 3/2 c) with 0 < c < fixed. Then Proof. Q m (r) = n! exp{ c n ( h + 2 + O C( h n )) 24c 2 2 ( h cn )2 + O C ( n )}. Recalling the inequality Q m (r) = (2m + n r)(2 m + n r )...(2 m r) n!(2 m ) n = n n! exp{ ln( + (n/2) h i )}. cn 3/2 i=0 x x2 2 + x3 3 x4 ln( + x) x x2 2 + x3 3, valid for 2 < x < ; we can bound the logarithmic term. We will evaluate all the terms of decomposition of Q m just with the standard summation formulas: n i = n(n+) n gives ( n h+/2 h i) = 2 cn 3/2 2 c ; n i= i=0 n i 2 = n(n+)(2n+) n gives ( n h 6 2c 2 n 3 2 i)2 = + ( h 24c 2 2 cn )2 + O c ( ); n i= i=0 n i 3 = n2 (n+) 2 n gives ( n h 4 3c i= 3 n 9/2 2 i)3 = O C ( h ); n 3/2 i=0 n ( n h 2 i)4 = O c ( ) gives ( n h n 2 i)4 = O c ( ). n i=0 c 4 n 6 n i=0 Putting these together, we get the estimate above.

Proposition 3.5. Let h be an integer such that Q m (n/2 + h) /n! h h. Then, for any fixed c, as n goes to infinity, with B. h = n 24c + 2c 3 + B + O c( n ), Proof. This bound can be found through looking at 3.4. Its exponent must be nonnegative in order to have Q m (n/2 + h) /n!. If we set it equal to zero for some h, the resulting expression is the one above. Theorem 3.6. Let Q m be the Gilbert-Shannon-Reeds distribution on the symmetric group S n, and U be the uniform distribution. Then for m = log 2 (n 3/2 c), with 0 < c < fixed, as n tends to, Q m U = 2Φ( 4c 3 ) + O C( ) (8) n/4 with Φ(x) = x e( t2 )/2 dt/ 2π. Proof. We have seen that the number of rising sequences is a sufficient information for estimating the probability of a distribution. This allows us to use the result of a paper by Diaconis and Zabell [4], which says that the total variation between two probabilities is equal to the total variation between the induced laws of any sufficient statistic. Then, if we denote the number of permutations with n/2 + h rising sequences by R nh, Q m U = n/2<h h R nh (Q m ( n 2 + h) n! ). The number of descents is crucial here: π has r descents if and only if π has r rising sequences. We can also recall that the Eulerian number a nj denotes the number of permutations with j descents. In his study of Eulerian numbers in [8], Tanny showed that the chance of the sum of n variables that are i.i.d. on [0,] is between j and j + equals a nj. n! If this is so, then a nj behaves according to the central limit theorem, and the same is then n! true for (a+n r n ) a n. Therefore n! h h= n/2 R nh = Φ( 4c )( + O( ) uniformly. 3 n We can also use the local central limit theorem as stated in [6], which, if used with x n = h (n/2) gives R nh n! = e (/2)(x n) 2 2πn/2 ( + o( n )) uniformly in h. (9) 2

(the derivation is almost identical to the one done in [6]). Now, we use (3.5). Its result can conveniently be divided into two zones: A = { 0n3/4 c h h }; and A 2 = { n 2 h < 0n3/4 c }}. (3.4) and (9) put together imply that R nh Q m (n/2 + h) = e /24c2 A 2πn/2 = e /24c2 2π e 2 ( h A (2c 2) = Φ( 4c )( + O( ). 3 n/4 ) 2 h n/2 c n +Oc ( n /4 ) + o( n ) e x2 /2 x/c 3 dx( + O( )) n/4 Now, for h in A 2, Q m (n/2 + h) Q m () e n/2c. n! A bound of the standard large deviation is given in Chapter 6 of [5]; applying it to our sum, with uniform n we get A 2 R nh n! 2n /4 0n /4 2π exp[ 2 (0 ) 2 ]. c This means that only zone A contributes, and the speed of convergence is as described above. This theorem also allows a corollary: Corollary 3.7. If n cards are shuffled m times with m = 3 2 log 2 n + θ, then for large n, with Q m U = 2Φ( 2 θ 4 3 ) + O( n Φ(x) = 2π x e t2 /2 dt. /4 ), Therefore, if θ is large, the distance to uniformity approaches 0, while for θ small, it approaches. We can calculate the different variation distances for distinct numbers of cards. Then, we see that about 3 log 2 2 n shuffles are necessary for shuffling n cards. Acknowledgements. This paper could never appear without my mentor Mohammad Abbas Rezaei, and his patience with correcting my mistakes and explaining me how to use LaTeX. Also, it could not be written without the advice of prof. Lalley and helpful comments from Jacob Perlman and Marcelo Alvisio. 3

References [] P. Diaconis (998) From Shuffling Cards to Walking Around the Building: An Introduction to Modern Markov Chain Theory Doc. Math. J. DMV, 87-204 [2] D. A. Levin, Y. Peres and E. L. Wilmer. Markov Chains and Mixing Times [3] D. Bayer and P. Diaconis. (992) Trailing the Dovetail Shuffle to its Lair Ann. Applied Prob. 2 294-34 [4] P. Diaconis and S. Zabell. (982) Updating subjective probability J. Amer. Statist. Assoc. 77 822-830 [5] W. Feller. (97) An Introduction to Probability and Its Applications Wiley, New York [6] G. F. Lawler and V. Limic. Random Walk: A Modern Introduction [7] L. Rogers and Pitman. (98) Markov Functions Ann. Probab. 9 573-582 [8] S. Tanny. (973) A Probabilistic Interpretation of the Eulerian Numbers Duke Math. J. 40 77-722 [9] P. Diaconis and Ron Graham. (20) Magical Mathematics: The Mathematical Ideas that Animate Great Magic Tricks Princeton University Press 4