Fastest mixing Markov chain on a path

Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov chai or radom walk mixes as rapidly as possible. I this ote we prove that fastest mixig is obtaied whe each edge has a trasitio probability of. Although this result is ituitive (it was cojectured i [7]), ad ca be foud umerically usig covex optimizatio methods [2], we give a self-cotaied proof. I [2], the authors ider the problem of assigig trasitio probabilities to the edges of a coected graph i such a way that the associated Markov chai mixes as rapidly as possible. We show that this problem ca be solved, at least umerically, usig tools of covex PSfrag optimizatio, replacemets i particular, semidefiite programmig [9, 3]. The preset ote presets a simple, self cotaied example where the optimal Markov chai ca be idetified aalytically. Cosider a path with 2 odes, labeled 1, 2,...,, with 1 edges coectig pairs of adjacet odes, ad a loop at each ode, as show i figure 1. We ider a Markov chai (or radom walk) o this path, with trasitio probability from ode i to ode j deoted P ij. The requiremet that trasitios ca oly occur alog a edge or loop of the path is equivalet to P ij = 0 for i j > 1, i.e., P is a tridiagoal matrix. Sice P ij are trasitio probabilities, we have P ij 0, ad j P ij = 1, i.e., P is a stochastic matrix. This ca be expressed as P 1 = 1, where 1 is the vector with all compoets oe. We will ider symmetric trasitio probabilities, i.e., those that satisfy P ij = P ji. Thus, P is a symmetric, (doubly) stochastic, tridiagoal matrix. Sice P 1 = 1, we have (1/) T P = 1/, which meas that the uiform distributio, give by 1 T /, is statioary. P 11 P 22 P 33 P12 P23 1 2 3 1 Figure 1: A path with loops at each ode, with trasitio probabilities labeled. Submitted to The America Mathematical Mothly, October 2003. Authors listed i alphabetical order. Iformatio Systems Laboratory, Departmet of Electrical Egieerig, Staford Uiversity, CA 94305-9510. E-mail addresses: boyd@staford.edu, suju@staford.edu, lxiao@staford.edu. Departmet of Statistics ad Departmet of Mathematics, Staford Uiversity, CA 94305-4065. 1

The eigevalues of P are real (sice it is symmetric), ad o more tha oe i modulus (sice it is stochastic). We deote them i oicreasig order: 1 = λ 1 (P ) λ 2 (P ) λ (P ) 1. The asymptotic rate of covergece of the Markov chai to the statioary distributio, i.e., its mixig rate, depeds o the secod-largest eigevalue modulus (SLEM) of P, which we deote µ(p ): µ(p ) = max i=2,, λ i(p ) = max {λ 2 (P ), λ (P )}. The smaller µ(p ) is, the faster the Markov chai coverges to its statioary distributio. For example, we have the followig boud: π(t) 1 T / TV µ t, where π(t) = π(0)p t is the probability distributio at time t, ad TV deotes the total variatio orm. (The total variatio distace betwee two probability distributios π ad ˆπ is the maximum of prob π (S) probˆπ (S) over all subsets S {1, 2,..., }.) For more backgroud, see, e.g., [6, 4, 1, 2] ad refereces therei. The questio we address is: What choice of P miimizes µ(p ) amog all symmetric stochastic tridiagoal matrices? I other words, what is the fastest mixig (symmetric) Markov chai o a path? We will show that the trasitio matrix P = 0 0 achieves the smallest possible value of µ(p ), (π/), amog all symmetric stochastic tridiagoal matrices. Thus, to obtai the fastest mixig Markov chai o a path, we assig a PSfrag probability replacemets of of movig left, a probability of movig right, ad a probability of stayig at each of the two ed odes. (For the odes ot at either ed, the probability of stayig at the ode is zero.) This optimal Markov chai is show i figure 2. 1 2 3 1 Figure 2: Fastest mixig Markov chai o a path. For = 2, we have µ(p ) = (π/2) = 0, which is clearly the optimal solutio; i oe step the distributio is exactly uiform, for ay iitial distributio π(0). For 3, P is the trasitio matrix oe would guess yields fastest mixig; ideed, this was cojectured i [7]. But we are ot aware of a simpler proof of its optimality tha the oe we give below. Before proceedig, we describe aother cotext where the same mathematical problem arises. We imagie that there is a processor at each ode of our path, ad that each lik 2 (1)

represets a direct etwork coectio betwee the adjacet processors. Processor i has a job queue or load q i (t) (which we approximate as a positive real umber) at time t. The goal is to shift jobs across the liks, at each step, i such a way as to balace the load. I other words, we would like to have q i (t) q as t, where q = (1/) i q i (0) is the average of the iitial queues. We igore the reductio i the queues due to processig (or equivaletly, assume that the load balacig is doe before the processig begis). We use the followig simple scheme to balace the load: at each step, we compute the load imbalace, q i+1 (t) q i (t), across each lik. We the trasfer a fractio θ i [0, 1] of the load imbalace from the more loaded to the less loaded processor. We must have θ i + θ i+1 1, to esure that we are ot asked to trasfer more tha the load o a processor to its eighbors. It ca be show that if θ i are positive, ad satisfy θ i + θ i+1 1, the this iterative scheme achieves asymptotic balaced loads, i.e., q i (t) q as t. The problem is to fid the fractios θ i that result i the fastest possible load balacig. It turs out that this optimal iterative load balacig problem is idetical to the problem of fidig the fastest mixig Markov chai o a path, with P i,i+1 = θ i. I particular, the evolutio of the loads at the processors is give by q(t) = P t q(0). The speed of covergece of q(t) to q1 is give by the secod-largest eigevalue modulus µ(p ). By the basic result i this paper, the fastest possible load balacig is accomplished by shiftig oe-half of the load imbalace o each edge from the more loaded to the less loaded processor. More discussio of this load balacig problem ca be foud i [7]. We ow proceed to prove the basic result. Lemma. Let P R be a symmetric stochastic matrix. The we have µ(p ) = P (1/)11 T 2, where 2 deotes the spectral orm (maximum sigular value). Proof. To see this, we ote that 1 is the eigevector of P associated with the eigevalue λ 1 = 1. Therefore the eigevalues of P 11 T / are 0, λ 2,..., λ. Sice P 11 T / is symmetric, its spectral orm is equal to the maximum magitude of its eigevalues, i.e., max{λ 2, λ }, which is µ(p ). Lemma. Let P R be a symmetric stochastic matrix, ad suppose y, z R satisfy The we have µ(p ) 1 T z. 1 T y = 0, y 2 = 1, (2) (z i + z j )/2 y i y j for P ij 0. (3) Proof. For ay P, y ad z that satisfy the assumptios i the lemma, we have µ(p ) = P (1/)11 T 2 y ( T P (1/)11 ) T y = y T P y = i,j P ij y i y j 3

i,j ()(z i + z j )P ij = ()(z T P 1 + 1 T P z) = 1 T z. The first iequality follows from the assumptio y 2 = 1 ad the first lemma. The secod iequality follows from the assumptio (3), ad P ij 0. Theorem. The matrix P, give i (1), attais the smallest value of µ, (π/), amog all symmetric stochastic tridiagoal matrices. Proof. The result is clear for = 2. We assume ow that > 2. The eigevalues ad associated orthoormal eigevectors of P are λ 1 = 1, v 0 = (1/ )1 ( ) ( ) (j 1)π 2 (2k 1)(j 1)π λ j =, v j (k) =, 2 j = 2,..., k = 1,...,. (See, e.g.,[8, 16.3].) Therefore we have µ(p ) = λ 2 = λ = (π/). We show that this is the smallest µ possible by tructig a pair y ad z that satisfy the assumptios (2) ad (3) i the secod lemma, for ay symmetric tridiagoal stochastic matrix P, with 1 T z = (π/). We take y = v 2, so the assumptios (2) i the secod lemma clearly hold. We take z to be z i = 1 [ ( ( ) / ( ) π (2i 1)π π + ) ], i = 1,...,. It is easy to verify that 1 T z = (π/). It remais to check that y ad z satisfy (3) for ay symmetric tridiagoal matrix P. Let s first check the superdiagoal etries. For i = 1,..., 1, we have z i + z i+1 2 = 1 [ + ) 1 ( ( ) ( )) / (2i 1)π (2i + 1)π + 2 = 1 [ ( ( )] π 2iπ + ) = 2 ( ) ( ) (2i 1)π (2i + 1)π = y i y i+1. 2 2 ) ] Therefore equality always holds for the superdiagoal (ad subdiagoal) etries. For the diagoal etries, we eed to check (z i + z i )/2 = z i y 2 i, i.e., + ) ( ) / (2i 1)π ( ( ) π (2i 1)π 2 ) 2 = 1 + 2 4 ( ) (2i 1)π

for i = 1,...,, which is equivalet to [ ( )] [ π 1 1 But this is certaily true because This completes the proof. ( ) (2i 1)π ( ) / (2i 1)π ( ) ] π 0, i = 1,...,., i = 1,...,. ) Refereces [1] D. Aldous ad J. Fill. Reversible Markov Chais ad Radom Walks o Graphs. stat-www.berkeley.edu/users/aldous/rwg/book.html, 2003. Forthcomig book. [2] S. Boyd, P. Diacois, ad L. Xiao. Fastest mixig Markov chai o a graph. To appear i SIAM Review, problems ad techiques sectio, 2004. Available at www.staford.edu/~boyd/fmmc.html. [3] S. Boyd ad L. Vadeberghe. Covex Optimizatio. Cambridge Uiversity Press, 2004. Available at www.staford.edu/~boyd/cvxbook.html. [4] P. Brémaud. Markov Chais, Gibbs Fields, Mote Carlo Simulatio ad Queues. Texts i Applied Mathematics. Spriger-Verlag, Berli-Heidelberg, 1999. [5] G. Cobb ad Y. Che. A applicatio of Markov chai Mote Carlo to commuity ecology. The America Mathematical Mothly, 110(4):265 288, 2003. [6] P. Diacois ad D. Stroock. Geometric bouds for eigevalues of Markov chais. The Aals of Applied Probability, 1(1):36 61, 1991. [7] R. Diekma, S. Muthukrisha, ad M. V. Nayakkakuppam. Egieerig diffusive load balacig algorithms usig experimets. I Lecture Notes i Computer Sciece, volume 1253, pages 111 122. Spriger Verlag, 1997. [8] W. Feller. A Itroductio to Probability ad Its Applicatios, volume I. Wiely, New York, 3rd editio, 1968. [9] L. Vadeberghe ad S. Boyd. Semidefiite programmig. SIAM Review, 38(1):49 95, 1996. 5