CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture and foralie the idea of a eoryless rando walk on a finite nuber of states, and which have wide applicability as a statistical odel of any phenoena Markov chains are postulated to have a set of possible states, and to transition randoly fro one state to a next state, where the probability of transitioning to a particular next state depends only on, and is deterined fully by, the state that it is currently in This last property of being eoryless and having the next state s probabilities deterined only by the current state is called the Markov property Rando-walk-based algorith Consider -SAT, the proble of deterining whether or not a boolean forula in conjunctive noral for, when there are at ost two literals per clause, has a satisfying assignent The following algorith is based on a rando walk First start with an arbitrary assignent to the n variables, and then, up to n ties, if the forula is not satisfied, choose a rando unsatisfied clause and a rando variable fro that clause, and flip the value of that variable If at any point during the n iterations of this, if the forula is satisfied, then return that satisfying assignent Else return unsatisfiable In order to analye this algorith, we ll consider how likely it is to stuble upon a particular satisfying assignent S during n iterations For each iteration t, consider X t, the nuber of variables that agree with S iediately before the tth iteration of the algorith The algorith can then be projected to a rando walk on X t : each tie a variable is flipped, it either increases or decreases X t by one The algorith succeeds if X t ever reaches n Since the flipped variable at each step is one of two variables in an unsatisfied clause, and since every clause is satisfied in S, there is at least an one-half chance that the flipped variable started out disagreeing with S and flipped into a state that agreed with S We express this as: [X t+ = i + X t = i] Consider the pessial version of this where [X t+ = i + X t = ] is exactly one half at every step (unless X t = ) This gives the process the property of being eoryless the transitions at each step are independent of the history of previous states, so this pessial version is a Markov chain Also assue that X = We will show that in this case, E[in t X t = n] = n This iplies that, if the forula is satisfiable, the expected nuber of iterations it ll take for the algorith to succeed is at ost n, so by doing n iterations, we obtain a chance at ost of this algorith reporting a false negative To show that E[in t X t = n] = n, let h j be the expected nuber of steps to reach n on our rando walk when we start at j We get a straightforward recurrence by expanding h j in ters of the nuber of steps to reach n one step after we leave j: h j = + h j + h j+ = h j h j+ = h j h j + Using the base case that h h =, a siple induction yields h j h j+ = j + Then, using h n =, another siple induction gives us: n h = h n + h j h j+ = j= n ( ) n(n ) j + = n + = n j= -

Finite Markov chains A finite Markov chain is a rando walk on a directed graph The directed graph has its edges labeled with transition probabilities, in a way so that the law of total probability holds (ie for each vertex, the sus of its outgoing edge labels is exactly ) Each vertex in this graph is called a state of the Markov chain A Markov chain can equivalently be described as a transition atrix P, where P ij is the probability of transitioning to state j if the preceding state is i Thus if X t denotes the state of the Markov chain at tie t, we write [X t+ = j X t = i] = P ij The property that P satisfies the law of total probability eans that it is a stochastic atrix Figure : Markov chain, depicted in graph for Figure : Sae Markov chain as a transition atrix Let p (t) denote the probability distribution for the state of the Markov chain at tie t, so that p (t) i = [X t = i] Then p () gives the initial probability distribution for the Markov chain is in, so that p () = (,,, ) eans to start in state, and p () = ( n,, n ) eans to start in a uniforly rando state The atrix forulation of a Markov chain gives us the following equivalences: p (t+) = p (t) P, and p (t+) = p (t) P Definition An irreducible Markov chain is one whose corresponding graph is strongly connected The period of a state i in a Markov chain is the greatest coon divisor of all n such that (P n ) ii > A Markov chain is aperiodic if the period of every state is For exaple, a bipartite Markov chain is never aperiodic, and given any Markov chain P, we can construct an aperiodic one by taking P = (I + P ) All the Markov chains we ll consider will be finite, irreducible, and aperiodic This iplies that there is soe N > such that for all n N, the entries of the atrix P n will be strictly positive (ie nonero) Stationary distributions Definition A probability distribution π is a stationary distribution of a Markov chain with transition atrix P if π = πp Consequently, for all states j, π j = i π ip ij A stationary distribution of P can be thought of as a fixed point, or an eigenvector of eigenvalue The atrix P ay have eigenvalues not equal to, but since the law of total probability ust be satisfied, the corresponding eigenvectors will not be probability distributions (they ust have negative coponents) The following technical definitions and lea will be needed to prove the Fundaental Theore of Markov Chains later on: Definition The hitting tie T ij is equal to in t {X t = j X = i} That is, it is a rando variable tracking the nuber of steps it takes to get to j fro i Let h ij = E[T ij ] Then h ii is the expected tie of first return to state i after starting fro state i Lea For all states x and y in an irreducible Markov chain, h xy < -

oof There are r N and ε > such that for all states and w and for soe j r, it holds that (P j ) w > ε This is true since we can guarantee a nonero unifor lower bound for all nonero entries in {(P j ) w } just because there are a finite nuber of such entries, and we can guarantee that there exists a nonero (P j ) w for each pair of states, w by taking a high enough r, because of irreducibility So consider a rando walk on the Markov chain starting at x Within the first r steps, there is at least a ε chance of the walk having reached y Suppose it doesn t: then the walk is at soe state x y after the rth step There is then at least a ε chance of reaching y in r ore steps starting fro x Continuing on in this fashion, the expected value of T xy is at least: h xy r + ( ε)r + ( ε) r + = r ( ε) k = r ε k= Theore (Fundaental Theore of Markov Chains) For any finite, irreducible, aperiodic Markov chain: There exists soe stationary distribution π, and π x > for all states x π is unique π i = /h ii For all states i, j, li t (P t ) ij = π j oof of () For a rando state, define π y to be the expected nuber of ties the rando walk visits state y before returning to Then: π y = [X t = y ( u < t)x u ] [( u < t)x u ] = h < t= We clai that π is a fixed point of P We verify this by calculating the yth coponent of πp, first by expanding the definition of π x, then by substituting x [X t = x]p xy = [X t+ = y]: ( πp ) y = x = x = t = = t= t= π x P xy ( t ( x t= ) [X t = x T > t] P xy [X t = x T > t]p xy ) [X t+ = y T t + ] [X t = y T t] = π y [X = y T > ] + t= = π y [X = y] + [X T = y] = π y, [X t = y T = t] where the last equality is because we set X =, and so X T = since T is the tie it takes to get back to So we concluded that πp = π, so all we have to do to get our stationary distribution is to noralie π: π x = π x y π y = π x h -

oof of () Suppose π and π are two stationary distributions of P and we ll show that π = π Let x = argin y π y / π y, and let c = π x / π x Then: c = π x y = π yp yx y π x y π c π yp yx yp yx y π = c yp yx We substituted using π y c π y and found that that inequality is actually an equality Therefore, since π and π are both noralied and π = c π, we ust have c = and π = π oof of () Reeber fro our construction of the stationary state π that: π x = π x y π y = π x h By uniqueness, we could have set = x and have coe up with the sae value for π x So then: π x = π x h = E x[# of visits to x before returning to x] h xx = h xx oof of () Because P is irreducible, there exists an r such that all the entries of P r are greater than ero Thus, for soe sufficiently sall δ >, (P r ) xy δπ y for all states x, y Let Π the the atrix fored by using π as each of its rows, so that: π π Π = π Then we can define a stochastic atrix Q by setting: P r = δπ + ( δ)q, where Q ust be stochastic because it is a weighted average of stochastic atrices with nonnegative entries Now, two properties of any stochastic atrix M are that MΠ = Π, and if πm = π then ΠM = Π To show that li t (P t ) xy = π y for all x, y, we will show that: P rk = ( ( δ) k )Π + ( δ) k Q k This proceeds by induction on k Since we can write P rk Π = ( δ) k (Q k Π), and ΠP = Π, we can ultiply both sides by P r and get: P rk+r Π = ( δ) k (Q k P r Π) = ( δ) k (Q k (δπ + ( δ)q) Π) = ( δ) k (δq k Π + ( δ)q k+ Π) = ( δ) k (( δ)q k+ ( δ)π), = ( δ) k+ (Q k+ Π), so that: P r(k+) = ( ( δ) k+ )Π + ( δ) k+ Q k+ -

Maxiu Matching in Regular Bipartite Graphs Hall s arriage theore guarantees that regular bipartite graphs (bipartite graphs where all vertices have the sae degree) always have a perfect atching In this section, we will give an algorith for finding a perfect atching for a regular bipartite graph G with n nodes and edges in O(n log n) expected tie A partial atching of G is a set of edges with no shared nodes Given a partial atching, an augenting path for that atching is a path which starts and ends at two separate unatched nodes and alternates between edges in the atch and edges outside of it Theore A atching M is axial if and only if there is no augenting path for M oof If there is an augenting path to M, then we can use it to construct a bigger atching Each node in M s path except for the ends was an endpoint of an edge in the atching, and the ends of the path were two different unatched nodes, so each node is visited exactly once Therefore, since the atched and unatched edges in the augenting path alternate, the unatched edges for a atching Since the path started and ended with unatched edges, there is one ore edge in this new atching than the original atching, aking it larger than M and eaning M is not axial Therefore, M is axial only if there is no augenting path Next, we want to show that if there is no augenting path for M, then M is axial Assue for contrapositive that M is not axial Then there exists soe larger atching M such that M > M Since M contains at least one ore edge than M, there are at least two ore atched vertices in M than in M Start at soe vertex atched in M but not in M, and follow edges atched in one of the (which ust alternate, since each vertex is atched to at ost one other vertex in each atching) until either reaching a vertex atched in M but not in M, or vice versa If the vertex is atched in M but not M, then this is an augenting path for M If the vertex is atched in M but not M, then this path had the sae nuber of vertices atched in M as in M Since we know M has ore vertices atched than M, we will always eventually find an augenting path for M The traditional approach to finding a perfect atching for G is the augenting path algorith: Repeatedly find an augenting path alternating between atched and unatched edges until no augenting path can be found This algorith runs in O() steps We will use a randoied algorith to iprove on this tie, running in expected tie O(n log n) We will still use the augenting path strategy, but instead of perforing breadth first search to find an augenting path, we will use a rando walk On each iteration of the algorith, we start with a partial atching M Our graph is bipartite, so let L be one of the independent sets of nodes and R be the other We will refer to L as being on the left, and R as being on the right Construct a directed graph G corresponding to our original graph G by having all edges in M go fro R to L, and all other edges go fro L to R Add two vertices s and t such that there is an edge fro s to each unatched vertex in L for each edge out of it, and an edge fro each unatched vertex in R for each edge into it to t Add edges fro t to s equal to the nuber of edges out of s (which is equal to the nuber of edges into t) Contract each edge in M to a single vertex with the edges of both endpoints By this construction, and because G is regular, each vertex in G will have the sae in-degree as out-degree, so G is Eulerian G has an augenting path if and only if there is a path fro an unatched vertex in L to an unatched vertex in R in G, or equivalently by our construction, G has a cycle fro s to s Our algorith will be to do a rando walk starting fro s The expected tie to find an augenting path is the expected travel tie fro s to s, E(T s,s ) = h s,s As we showed earlier, h s,s = π s Clai In a Eulerian directed graph, the stationary distribution is π v = dout (v) where d out is the out degree of vertex v (which is equal to its in degree) and is the total nuber of edges in the graph oof In the stationary distribution, π v is the su of in flow to v fro all vertices: -5

π v = u (u,v) E π u p uv = u (u,v) E d out (u) d out (u) = dout (v) = din (v) d out (s) Therefore, h s,s = In each iteration of the rando walk, we add an edge to our atching In iteration i, there are i edges in the atching, eaning that d out (s) (n i)d, where d is the degree of G Then h s,s dn d(n i) = Since we ust find a cycle in each of the n iterations, the total running tie of the algorith is at ost: n i= The algorith runs in O(n log n) tie as desired n = O(n log n) n i n n i -6