CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Similar documents
13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Solutions of some selected problems of Homework 4

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Feature Extraction Techniques

Block designs and statistics

The Weierstrass Approximation Theorem

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

Polygonal Designs: Existence and Construction

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

1 Generalization bounds based on Rademacher complexity

Fixed-to-Variable Length Distribution Matching

3.3 Variational Characterization of Singular Values

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

lecture 36: Linear Multistep Mehods: Zero Stability

arxiv: v1 [math.nt] 14 Sep 2014

1 Proof of learning bounds

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Lecture 9 November 23, 2015

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

The Fundamental Basis Theorem of Geometry from an algebraic point of view

Lecture 21. Interior Point Methods Setup and Algorithm

Physics 139B Solutions to Homework Set 3 Fall 2009

List Scheduling and LPT Oliver Braun (09/05/2017)

Physics 215 Winter The Density Matrix

a a a a a a a m a b a b

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

A Markov Framework for the Simple Genetic Algorithm

A note on the multiplication of sparse matrices

Lecture 21 Nov 18, 2015

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

i ij j ( ) sin cos x y z x x x interchangeably.)

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

1 Bounding the Margin

Ch 12: Variations on Backpropagation

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

25.1 Markov Chain Monte Carlo (MCMC)

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

arxiv: v1 [cs.ds] 3 Feb 2014

Chapter 6 1-D Continuous Groups

The degree of a typical vertex in generalized random intersection graph models

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

Combining Classifiers

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

1 Identical Parallel Machines

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

Probability & Computing

CS Lecture 13. More Maximum Likelihood

Bipartite subgraphs and the smallest eigenvalue

Singularities of divisors on abelian varieties

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Interactive Markov Models of Evolutionary Algorithms

Midterm 1 Sample Solution

New Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Lecture 8 Symmetries, conserved quantities, and the labeling of states Angular Momentum

The Inferential Complexity of Bayesian and Credal Networks

Markov Chains and Stochastic Sampling

Pattern Recognition and Machine Learning. Artificial Neural networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

On random Boolean threshold networks

Study on Markov Alternative Renewal Reward. Process for VLSI Cell Partitioning

arxiv: v3 [cs.sy] 29 Dec 2014

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

MA304 Differential Geometry

Algebraic Montgomery-Yang problem: the log del Pezzo surface case

Curious Bounds for Floor Function Sums

Birthday Paradox Calculations and Approximation

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Randomized Recovery for Boolean Compressed Sensing

COS 424: Interacting with Data. Written Exercises

Monochromatic images

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

Markov Chains Handout for Stat 110

Multi-Dimensional Hegselmann-Krause Dynamics

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

12 th Annual Johns Hopkins Math Tournament Saturday, February 19, 2011 Power Round-Poles and Polars

Support Vector Machines MIT Course Notes Cynthia Rudin

Markov Chains, Random Walks on Graphs, and the Laplacian

Lecture 21 Principle of Inclusion and Exclusion

Lean Walsh Transform

Problem Set 2. Chapter 1 Numerical:

MULTIPLAYER ROCK-PAPER-SCISSORS

Least Squares Fitting of Data

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone

In this chapter, we consider several graph-theoretic and probabilistic models

Computational and Statistical Learning Theory

ORIGAMI CONSTRUCTIONS OF RINGS OF INTEGERS OF IMAGINARY QUADRATIC FIELDS

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

1 Rademacher Complexity Bounds

ma x = -bv x + F rod.

Transcription:

CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture and foralie the idea of a eoryless rando walk on a finite nuber of states, and which have wide applicability as a statistical odel of any phenoena Markov chains are postulated to have a set of possible states, and to transition randoly fro one state to a next state, where the probability of transitioning to a particular next state depends only on, and is deterined fully by, the state that it is currently in This last property of being eoryless and having the next state s probabilities deterined only by the current state is called the Markov property Rando-walk-based algorith Consider -SAT, the proble of deterining whether or not a boolean forula in conjunctive noral for, when there are at ost two literals per clause, has a satisfying assignent The following algorith is based on a rando walk First start with an arbitrary assignent to the n variables, and then, up to n ties, if the forula is not satisfied, choose a rando unsatisfied clause and a rando variable fro that clause, and flip the value of that variable If at any point during the n iterations of this, if the forula is satisfied, then return that satisfying assignent Else return unsatisfiable In order to analye this algorith, we ll consider how likely it is to stuble upon a particular satisfying assignent S during n iterations For each iteration t, consider X t, the nuber of variables that agree with S iediately before the tth iteration of the algorith The algorith can then be projected to a rando walk on X t : each tie a variable is flipped, it either increases or decreases X t by one The algorith succeeds if X t ever reaches n Since the flipped variable at each step is one of two variables in an unsatisfied clause, and since every clause is satisfied in S, there is at least an one-half chance that the flipped variable started out disagreeing with S and flipped into a state that agreed with S We express this as: [X t+ = i + X t = i] Consider the pessial version of this where [X t+ = i + X t = ] is exactly one half at every step (unless X t = ) This gives the process the property of being eoryless the transitions at each step are independent of the history of previous states, so this pessial version is a Markov chain Also assue that X = We will show that in this case, E[in t X t = n] = n This iplies that, if the forula is satisfiable, the expected nuber of iterations it ll take for the algorith to succeed is at ost n, so by doing n iterations, we obtain a chance at ost of this algorith reporting a false negative To show that E[in t X t = n] = n, let h j be the expected nuber of steps to reach n on our rando walk when we start at j We get a straightforward recurrence by expanding h j in ters of the nuber of steps to reach n one step after we leave j: h j = + h j + h j+ = h j h j+ = h j h j + Using the base case that h h =, a siple induction yields h j h j+ = j + Then, using h n =, another siple induction gives us: n h = h n + h j h j+ = j= n ( ) n(n ) j + = n + = n j= -

Finite Markov chains A finite Markov chain is a rando walk on a directed graph The directed graph has its edges labeled with transition probabilities, in a way so that the law of total probability holds (ie for each vertex, the sus of its outgoing edge labels is exactly ) Each vertex in this graph is called a state of the Markov chain A Markov chain can equivalently be described as a transition atrix P, where P ij is the probability of transitioning to state j if the preceding state is i Thus if X t denotes the state of the Markov chain at tie t, we write [X t+ = j X t = i] = P ij The property that P satisfies the law of total probability eans that it is a stochastic atrix Figure : Markov chain, depicted in graph for Figure : Sae Markov chain as a transition atrix Let p (t) denote the probability distribution for the state of the Markov chain at tie t, so that p (t) i = [X t = i] Then p () gives the initial probability distribution for the Markov chain is in, so that p () = (,,, ) eans to start in state, and p () = ( n,, n ) eans to start in a uniforly rando state The atrix forulation of a Markov chain gives us the following equivalences: p (t+) = p (t) P, and p (t+) = p (t) P Definition An irreducible Markov chain is one whose corresponding graph is strongly connected The period of a state i in a Markov chain is the greatest coon divisor of all n such that (P n ) ii > A Markov chain is aperiodic if the period of every state is For exaple, a bipartite Markov chain is never aperiodic, and given any Markov chain P, we can construct an aperiodic one by taking P = (I + P ) All the Markov chains we ll consider will be finite, irreducible, and aperiodic This iplies that there is soe N > such that for all n N, the entries of the atrix P n will be strictly positive (ie nonero) Stationary distributions Definition A probability distribution π is a stationary distribution of a Markov chain with transition atrix P if π = πp Consequently, for all states j, π j = i π ip ij A stationary distribution of P can be thought of as a fixed point, or an eigenvector of eigenvalue The atrix P ay have eigenvalues not equal to, but since the law of total probability ust be satisfied, the corresponding eigenvectors will not be probability distributions (they ust have negative coponents) The following technical definitions and lea will be needed to prove the Fundaental Theore of Markov Chains later on: Definition The hitting tie T ij is equal to in t {X t = j X = i} That is, it is a rando variable tracking the nuber of steps it takes to get to j fro i Let h ij = E[T ij ] Then h ii is the expected tie of first return to state i after starting fro state i Lea For all states x and y in an irreducible Markov chain, h xy < -

oof There are r N and ε > such that for all states and w and for soe j r, it holds that (P j ) w > ε This is true since we can guarantee a nonero unifor lower bound for all nonero entries in {(P j ) w } just because there are a finite nuber of such entries, and we can guarantee that there exists a nonero (P j ) w for each pair of states, w by taking a high enough r, because of irreducibility So consider a rando walk on the Markov chain starting at x Within the first r steps, there is at least a ε chance of the walk having reached y Suppose it doesn t: then the walk is at soe state x y after the rth step There is then at least a ε chance of reaching y in r ore steps starting fro x Continuing on in this fashion, the expected value of T xy is at least: h xy r + ( ε)r + ( ε) r + = r ( ε) k = r ε k= Theore (Fundaental Theore of Markov Chains) For any finite, irreducible, aperiodic Markov chain: There exists soe stationary distribution π, and π x > for all states x π is unique π i = /h ii For all states i, j, li t (P t ) ij = π j oof of () For a rando state, define π y to be the expected nuber of ties the rando walk visits state y before returning to Then: π y = [X t = y ( u < t)x u ] [( u < t)x u ] = h < t= We clai that π is a fixed point of P We verify this by calculating the yth coponent of πp, first by expanding the definition of π x, then by substituting x [X t = x]p xy = [X t+ = y]: ( πp ) y = x = x = t = = t= t= π x P xy ( t ( x t= ) [X t = x T > t] P xy [X t = x T > t]p xy ) [X t+ = y T t + ] [X t = y T t] = π y [X = y T > ] + t= = π y [X = y] + [X T = y] = π y, [X t = y T = t] where the last equality is because we set X =, and so X T = since T is the tie it takes to get back to So we concluded that πp = π, so all we have to do to get our stationary distribution is to noralie π: π x = π x y π y = π x h -

oof of () Suppose π and π are two stationary distributions of P and we ll show that π = π Let x = argin y π y / π y, and let c = π x / π x Then: c = π x y = π yp yx y π x y π c π yp yx yp yx y π = c yp yx We substituted using π y c π y and found that that inequality is actually an equality Therefore, since π and π are both noralied and π = c π, we ust have c = and π = π oof of () Reeber fro our construction of the stationary state π that: π x = π x y π y = π x h By uniqueness, we could have set = x and have coe up with the sae value for π x So then: π x = π x h = E x[# of visits to x before returning to x] h xx = h xx oof of () Because P is irreducible, there exists an r such that all the entries of P r are greater than ero Thus, for soe sufficiently sall δ >, (P r ) xy δπ y for all states x, y Let Π the the atrix fored by using π as each of its rows, so that: π π Π = π Then we can define a stochastic atrix Q by setting: P r = δπ + ( δ)q, where Q ust be stochastic because it is a weighted average of stochastic atrices with nonnegative entries Now, two properties of any stochastic atrix M are that MΠ = Π, and if πm = π then ΠM = Π To show that li t (P t ) xy = π y for all x, y, we will show that: P rk = ( ( δ) k )Π + ( δ) k Q k This proceeds by induction on k Since we can write P rk Π = ( δ) k (Q k Π), and ΠP = Π, we can ultiply both sides by P r and get: P rk+r Π = ( δ) k (Q k P r Π) = ( δ) k (Q k (δπ + ( δ)q) Π) = ( δ) k (δq k Π + ( δ)q k+ Π) = ( δ) k (( δ)q k+ ( δ)π), = ( δ) k+ (Q k+ Π), so that: P r(k+) = ( ( δ) k+ )Π + ( δ) k+ Q k+ -

Maxiu Matching in Regular Bipartite Graphs Hall s arriage theore guarantees that regular bipartite graphs (bipartite graphs where all vertices have the sae degree) always have a perfect atching In this section, we will give an algorith for finding a perfect atching for a regular bipartite graph G with n nodes and edges in O(n log n) expected tie A partial atching of G is a set of edges with no shared nodes Given a partial atching, an augenting path for that atching is a path which starts and ends at two separate unatched nodes and alternates between edges in the atch and edges outside of it Theore A atching M is axial if and only if there is no augenting path for M oof If there is an augenting path to M, then we can use it to construct a bigger atching Each node in M s path except for the ends was an endpoint of an edge in the atching, and the ends of the path were two different unatched nodes, so each node is visited exactly once Therefore, since the atched and unatched edges in the augenting path alternate, the unatched edges for a atching Since the path started and ended with unatched edges, there is one ore edge in this new atching than the original atching, aking it larger than M and eaning M is not axial Therefore, M is axial only if there is no augenting path Next, we want to show that if there is no augenting path for M, then M is axial Assue for contrapositive that M is not axial Then there exists soe larger atching M such that M > M Since M contains at least one ore edge than M, there are at least two ore atched vertices in M than in M Start at soe vertex atched in M but not in M, and follow edges atched in one of the (which ust alternate, since each vertex is atched to at ost one other vertex in each atching) until either reaching a vertex atched in M but not in M, or vice versa If the vertex is atched in M but not M, then this is an augenting path for M If the vertex is atched in M but not M, then this path had the sae nuber of vertices atched in M as in M Since we know M has ore vertices atched than M, we will always eventually find an augenting path for M The traditional approach to finding a perfect atching for G is the augenting path algorith: Repeatedly find an augenting path alternating between atched and unatched edges until no augenting path can be found This algorith runs in O() steps We will use a randoied algorith to iprove on this tie, running in expected tie O(n log n) We will still use the augenting path strategy, but instead of perforing breadth first search to find an augenting path, we will use a rando walk On each iteration of the algorith, we start with a partial atching M Our graph is bipartite, so let L be one of the independent sets of nodes and R be the other We will refer to L as being on the left, and R as being on the right Construct a directed graph G corresponding to our original graph G by having all edges in M go fro R to L, and all other edges go fro L to R Add two vertices s and t such that there is an edge fro s to each unatched vertex in L for each edge out of it, and an edge fro each unatched vertex in R for each edge into it to t Add edges fro t to s equal to the nuber of edges out of s (which is equal to the nuber of edges into t) Contract each edge in M to a single vertex with the edges of both endpoints By this construction, and because G is regular, each vertex in G will have the sae in-degree as out-degree, so G is Eulerian G has an augenting path if and only if there is a path fro an unatched vertex in L to an unatched vertex in R in G, or equivalently by our construction, G has a cycle fro s to s Our algorith will be to do a rando walk starting fro s The expected tie to find an augenting path is the expected travel tie fro s to s, E(T s,s ) = h s,s As we showed earlier, h s,s = π s Clai In a Eulerian directed graph, the stationary distribution is π v = dout (v) where d out is the out degree of vertex v (which is equal to its in degree) and is the total nuber of edges in the graph oof In the stationary distribution, π v is the su of in flow to v fro all vertices: -5

π v = u (u,v) E π u p uv = u (u,v) E d out (u) d out (u) = dout (v) = din (v) d out (s) Therefore, h s,s = In each iteration of the rando walk, we add an edge to our atching In iteration i, there are i edges in the atching, eaning that d out (s) (n i)d, where d is the degree of G Then h s,s dn d(n i) = Since we ust find a cycle in each of the n iterations, the total running tie of the algorith is at ost: n i= The algorith runs in O(n log n) tie as desired n = O(n log n) n i n n i -6