Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Similar documents
EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Probability & Computing

25.1 Markov Chain Monte Carlo (MCMC)

6.842 Randomness and Computation February 24, Lecture 6

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

1 Random Walks and Electrical Networks

(K2) For sum of the potential differences around any cycle is equal to zero. r uv resistance of {u, v}. [In other words, V ir.]

Online Social Networks and Media. Link Analysis and Web Search

Lecture 5: Random Walks and Markov Chain

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Online Social Networks and Media. Link Analysis and Web Search

Link Analysis Ranking

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Markov Chains Handout for Stat 110

Approximate Counting and Markov Chain Monte Carlo

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Math Homework 5 Solutions

Markov Chains, Random Walks on Graphs, and the Laplacian

Disjointness and Additivity

Lecture 2: September 8

Markov Chains and Stochastic Sampling

Two Proofs of Commute Time being Proportional to Effective Resistance

Markov Processes Hamid R. Rabiee

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

Markov chains (week 6) Solutions

Lecture 20 : Markov Chains

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Convex Optimization CMU-10725

Introducing the Laplacian

1 Ways to Describe a Stochastic Process

0.1 Naive formulation of PageRank

Introduction to Machine Learning CMU-10701

Laplacian eigenvalues and optimality: II. The Laplacian of a graph. R. A. Bailey and Peter Cameron

Markov Chains, Stochastic Processes, and Matrix Decompositions

Electrical networks and Markov chains

6.842 Randomness and Computation March 3, Lecture 8

Markov chains. Randomness and Computation. Markov chains. Markov processes

Stochastic process. X, a series of random variables indexed by t

Generating random spanning trees. Andrei Broder. DEC - Systems Research Center 130 Lytton Ave., Palo Alto, CA

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Discrete quantum random walks

Lecture: Modeling graphs with electrical networks

STOCHASTIC PROCESSES Basic notions

1.3 Convergence of Regular Markov Chains

18.175: Lecture 30 Markov chains

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

IEOR 6711: Professor Whitt. Introduction to Markov Chains

Chapter 7. Markov chain background. 7.1 Finite state space

6 Markov Chain Monte Carlo (MCMC)

Random Walks and Electric Resistance on Distance-Regular Graphs


Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Quantum walk algorithms

R ij = 2. Using all of these facts together, you can solve problem number 9.

TMA 4265 Stochastic Processes Semester project, fall 2014 Student number and

Model Counting for Logical Theories

Link Analysis. Leonid E. Zhukov

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

ELECTRICAL NETWORK THEORY AND RECURRENCE IN DISCRETE RANDOM WALKS

The Markov Chain Monte Carlo Method

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

1 Adjacency matrix and eigenvalues

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

CS 138b Computer Algorithm Homework #14. Ling Li, February 23, Construct the level graph

2 Discrete-Time Markov Chains

Lecture 3: graph theory

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Propp-Wilson Algorithm (and sampling the Ising model)

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Stochastic Simulation

Lecture 6: September 22

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Lecture 13: Spectral Graph Theory

Classification of Countable State Markov Chains

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Discrete Markov Chain. Theory and use

2 Fundamentals. 2.1 Graph Theory. Ulrik Brandes and Thomas Erlebach

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

MARKOV PROCESSES. Valerio Di Valerio

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

Problem Set 2. Assigned: Mon. November. 23, 2015

Markov Processes. Stochastic process. Markov process

BBM402-Lecture 20: LP Duality

Essentials on the Analysis of Randomized Algorithms

Transcription:

Markov Chains Markov chains: 2SAT: Powerful tool for sampling from complicated distributions rely only on local moves to explore state space. Many use Markov chains to model events that arise in nature. We create Markov chains to explore and sample from problems. Fix some assignment A let f(k) be expected time to get all n variables to match A if n currently match. Then f(n) = 0, f(0) = 1 + f(1), and f(k) = 1 + 1 (f(k + 1) + f(k 1). 2 Rewrite: f(0) f(1) = 1 and f(k) f(k + 1) = 2 + f(k 1) f(k) So f(k) f(k + 1) = 2k + 1 deduce f(0) = 1 + 3 + + (2n 1) = n 2 so, find with probability 1/2 in 2n 2 time. With high probability, find in O(n 2 log n) time. More general formulation: Markov chain State space S markov chain begins in a start state X 0, moves from state to state, so output of chain is a sequence of states X 0, X 1,... = {X t } t=0 movement controlled by matrix of transition probabilities p ij = probability next state will be j given current is i. thus, j p ij = 1 for every i S implicit in definition is memorylessness property: Pr[X t+1 = j X 0 = i 0, X 1 = i 1,..., X t = i] = Pr[X t+1 = j X t = i] = p ij. Initial state X 0 can come from any probability distribution, or might be fixed (trivial prob. dist.) Dist for X 0 leads to dist over sequences {X t } Suppose X t has distribution q (vector, q i is prob. of state i). Then X t+1 has dist qp. Why? 1

Observe Pr[X t+r = j X t = i] = P r ij Graph of MC: Vertex for every state Edge (i, j) if p ij > 0 Edge weight p ij weighted outdegree 1 Possible state sequences are paths through the graph Stationary distribution: a π such that πp = π left eigenvector, eigenvalue 1 steady state behavior of chain: if in stationary, stay there. note stationary distribution is a sample from state space, so if can get right stationary distribution, can sample lots of chains have them. to say which, need definitions. Things to rule out: infinite directed line (no stationary) 2-cycle (no stationary) disconnected graph (multiple) Irreducibility any state can each any other state i.e. path between any two states i.e. single strong component in graph Persistence/Transience: (t) is probability first hit state j at t, given start state i. r ij f ij is probability eventually reach j from i, so r (t) ij expected time to reach is hitting time h ij = (t) tr ij 2

If f ij < 1 then h ij = since might never reach. Converse not always true. If f ii < 1, state is transient. Else persistent. If h ii =, null persistent. Persistence in finite graphs: graph has strong components final strong component has no outgoing edges Nonfinal components: once leave nonfinal component, cannot return if nonfinal, nonzero probability of leaving in n steps. so guaranteed to leave eventually so, vertices in nonfinal components are transient Final components if final, will stay in that component If two vertices in same strong component, have path between them so nonzero probability of reaching in (say) n steps. so, vertices in final components are persistent geometric distribution on time to reach, so expected time finite. Not null-persistent Conclusion: Periodicity: In finite chain, no null-persistent states In finite irreducible chain, all states non-null persistent (no transient states) Periodicity of a state is max T such that some state only has nonzero probability at times a + T i for integer i Chain periodic if some state has periodicity > 1 In graph, all cycles containing state have length multiple of T Easy to eliminate: add self loops slows down chain, otherwise same Ergodic: aperiodic and non-null persistent means might be in state at any time in (sufficiently far) future 3

Fundamental Theorem of Markov chains: Any irreducible, finite, aperiodic Markov chain satisfies: All states ergodic (reachable at any time in future) unique stationary distribution π, with all π i > 0 f ii = 1 and h ii = 1/π i number of times visit i in t steps approaches tπ i in limit of t. Justify all except uniqueness here. Finite irreducible aperiodic implies ergodic (since finite irreducible implies non-null persis tent) Intuitions for quantities: h ii is expected return time So hit every 1/h ii steps on average So h ii = 1/π i If in stationary dist, tπ i visits follows from linearity of expectation Random walks on undirected graphs: general Markov chains are directed graphs. But undirected have some very nice properties. take a connected, non-bipartite undirected graph on n vertices states are vertices. move to uniformly chosen neighber. So p uv = 1/d(u) for every neighbor v stationary distribution: π v = d(v)/2m unqiqueness says this is only one deduce h vv = 2m/d(v) Definitions: Hitting time h uv is expected time to reach u from v commute time is h uv + h vu C u (G) is expected time to visit all vertices of G, starting at u cover time is max u C u (G) (so in fact is max over any starting distribution). 4

let s analyze max cover time Examples: clique: commute time n, cover time Θ(n log n) line: commute time between ends is Θ(n 2 ) lollipop: h uv = Θ(n 3 ) while h vu = Θ(n 2 ) (big difference!) also note: lollipop has edges added to line, but higher cover time: adding edges can increase cover time even though improves connectivity. general graphs: adjacent vertices: lemma: for adjcaent (u, v), h uv + h vu 2m proof: new markov chain on edge traversed following vertex MC transition matrix is doubly stochastic: column sums are 1 (exactly d(v) edges can transit to edge (v, w), each does so with probability 1/d(v)) In homework, show such matrices have uniform stationary distribution. Deduce π e = 1/2m. Thus h ee = 2m. So consider suppose original chain on vertex v. suppose arrived via (u, v) expected to traverse (u, v) again in 2m steps at this point will have commuted u to v and back. so conditioning on arrival method, commute time 2m (thanks to memorylessness) General graph cover time: theorem: cover time O(mn) proof: find a spanning tree consider a dfs of tree-crosses each edge once in each direction, gives order v 1,..., v 2n 1 time for the vertices to be visited in this order is upper bounded by commute time but vertices adjacent, so commute times O(m) total time O(mn) tight for lollipop, loose for line. Tighter analysis: analogue with electrical networks 5

Assume unit edge resistance Kirchoff s law: current (rate of transitions) conservation Ohm s law Gives effective resistance R uv between two vertices. Theorem: C uv = 2mR uv (tightens previous theorem, since R uv 1) Proof: Suppose put d(x) amperes into every x, remove 2m from v φ uv voltage at u with respect to v Ohm: Current from u to w is φ uv φ wv Kirchoff: d(u) = w N (u) currents = w N (u) φ uv φ wv = d(u)φ uv φ wv Also, h uv = (1/d(u))(1 + h wv ) same soln to both linear equations, so φ uv = h uv By same arg, h vu is voltage at v wrt u, if insert 2m at u and remove d(x) from every x add linear systems, find h uv + h vu is voltage difference when insert 2m at u and remove at v. now apply ohm. 6