Markov chains. Randomness and Computation. Markov chains. Markov processes

Similar documents
Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Markov Chains and Stochastic Sampling

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Processes Hamid R. Rabiee

6 Markov Chain Monte Carlo (MCMC)

MARKOV PROCESSES. Valerio Di Valerio

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Random Times and Their Properties

Disjointness and Additivity

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

STOCHASTIC PROCESSES Basic notions

Markov Chains, Stochastic Processes, and Matrix Decompositions

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Markov Chains (Part 3)

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Markov chains (week 6) Solutions

Countable state discrete time Markov Chains

Markov Chains, Random Walks on Graphs, and the Laplacian

IEOR 6711: Professor Whitt. Introduction to Markov Chains

Basic Definitions: Indexed Collections and Random Functions

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

Lecture 20 : Markov Chains

Chapter 7. Markov chain background. 7.1 Finite state space

The Theory behind PageRank

Summary of Results on Markov Chains. Abstract

Probability & Computing

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

SMSTC (2007/08) Probability.

Approximate Counting and Markov Chain Monte Carlo

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Convex Optimization CMU-10725

TMA 4265 Stochastic Processes Semester project, fall 2014 Student number and

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Stochastic Processes (Week 6)

Essentials on the Analysis of Randomized Algorithms

2. Transience and Recurrence

STAT STOCHASTIC PROCESSES. Contents

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

The Markov Chain Monte Carlo Method

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Lecture 9 Classification of States

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

Randomness and Computation

Math Homework 5 Solutions

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Lecture 5: Random Walks and Markov Chain

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Sample Spaces, Random Variables

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains


Classification of Countable State Markov Chains

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

4.7.1 Computing a stationary distribution

An Application of Graph Theory in Markov Chains Reliability Analysis

Stochastic Processes

Statistics 150: Spring 2007

Question Points Score Total: 70

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

20.1 2SAT. CS125 Lecture 20 Fall 2016

MAA704, Perron-Frobenius theory and Markov chains.

Lectures on Markov Chains

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

25.1 Markov Chain Monte Carlo (MCMC)

Stochastic process. X, a series of random variables indexed by t

STA 624 Practice Exam 2 Applied Stochastic Processes Spring, 2008

2007 Winton. Empirical Distributions

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

Markov chain Monte Carlo

Statistics 992 Continuous-time Markov Chains Spring 2004

LTCC. Exercises solutions

The Transition Probability Function P ij (t)

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

1/2 1/2 1/4 1/4 8 1/2 1/2 1/2 1/2 8 1/2 6 P =

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 7

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Markov Chains and MCMC

Computer Engineering II Solution to Exercise Sheet Chapter 12

Theory of Stochastic Processes 3. Generating functions and their applications

9 - Markov processes and Burt & Allison 1963 AGEC

Lecture #5. Dependencies along the genome

The Monte Carlo Method

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

ON A CONJECTURE OF WILLIAM HERSCHEL

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Convergence Rate of Markov Chains

Birth-death chain models (countable state)

Transcription:

Markov chains Randomness and Computation or, Randomized Algorithms Mary Cryan School of Informatics University of Edinburgh Definition (Definition 7) A discrete-time stochastic process on the state space Ω is said to be a Markov chain if Pr[X t = a t X t = a t,, X = a ] = Pr[X t = a t X t = a t ] We will often denote the Markov chain by the name M, and write M[a t, a t ] to denote the probability Pr[X t = a t X t = a t ] RC (7/8) Lecture slide RC (7/8) Lecture slide Markov processes Markov chains A Markov process is a rv X = {X(t) : t T }, usually T = N X(t) (sometimes written as X t ) is the state of the process at time t T, this is an element of some state set Ω, usually a discrete finite set, sometimes countably infinite A Markov process with the memoryless property is one where X t+ will depends on the previous state X t, but on none of the previous states We can think about the Markov chain M in terms of a matrix of dimensions Ω Ω (if Ω is finite) or of infinite dimension if Ω is countably infinite M[a, a ] M[a, a ] M[a, a j ] M[a, a ] M[a, a ] M[a, a j ] M[a j, a ] M[a j, a ] M[a j, a j ] This is often called the transition matrix of the Markov chain RC (7/8) Lecture slide RC (7/8) Lecture slide

Transition matrix Iterations of the Markov chain This representation assumes that the states of the Markov chain have been put in - correspondance with N This is certainly possible (as we assume Ω is countably infinite at worst) but it s not always the case that this ordering is particularly natural In many situations (especially when Ω is finite but exponentially sized), we ll avoid writing down the transition matrix at all - as an example, see the description of the contingency tables chain Note that for every a Ω, b Ω M[a, b] = (so a s-row sums to ) If we then carry out a second step of the Markov chain, the random variable X() will then be distributed according to p(), where And so on p() = p() M = p() M M = p() M After t steps of the Markov chain M, the random variable X(t) will then be distributed according to p(t), where p(t) = p() M t RC (7/8) Lecture slide 5 RC (7/8) Lecture slide 7 Iterations of the Markov chain Suppose we start our Markov process with the initial state X() being some fixed a Ω The next state" X() is distributed according to a s row of the transition matrix M (with probability M[a, b] of X() becoming b) If we define p() to be the row vector with p a () = and all other entries, then we can define the probability distribution p() by p() = p() M, and the probability that X() is b at the next step is p b () (which equals M[a, b]), for every b Ω The next state" X() is a random variable, and its distribution is the vector p() above (with values summing to ) Example Markov chain Let s look at a concrete example of a Markov chain on the state set Ω = {,,, }, with probabilities given by the following transition matrix: M = 6 6 RC (7/8) Lecture slide 6 RC (7/8) Lecture slide 8

Example Markov chain Notice that every row of the matrix sums to, as required Not the case for the columns (and doesn t have to be) In the graph representation, the states" are vertices and the transitions" of the chain are directed edges (labelled with the appropriate probability) Consider running this Markov chain from Ω Then we set p() = ( ) Evaluating p() = p() M, we note it will be ( Evaluating p() = p() M, we can compute it as ( 8 8 And so on 7 8 ) We would expect these p(t) values to be different if we had started with a different initial state However, these alternate t-step distributions can be easily calculated/compared, if we have pre-computed M t ) RC (7/8) Lecture slide 9 Markov chains for Randomized Algorithms Another application of Markov chains arises when we want to randomly sample (or approximately count) the number of combinatorial structures satisfying some constraint In these cases the elements of the state space Ω are individual combinatorial structures themselves, and this space is often very large (potentially exponential in the size of the input) One example we can imagine is the set of contingency tables Σ r,c for given row sums r = (r,, r m ) and column sums c = (c,, c n ) With large state spaces of combinatorial elements, we don t explicitly write down" the transition matrix of a Markov chain on the space - we just give (randomized) rules explaining how we move (randomly) from one element to another In this week s tutorial, you are asked to prove that the " Markov chain converges to the uniform distribution on Ω = Σ r,c RC (7/8) Lecture slide Markov chains for Randomized Algorithms The graph (and transition matrix) on slide is just a toy" example In our world, we will want to exploit Markov chains to obtain randomized algorithms In the SAT example of Section 7, [MU], we are concerned with assignments (functions) A of boolean values to the n logical variables (and which will achieve the max number of satisfied clauses) The transitions of the modified Markov chain for SAT (Y ) are designed to model transitions back and forth between better" and worse" assignments However, the states of the Markov chain (Y ) are the natural numbers,,, n, with n being the number of variables This is a fairly small state space Next Tuesday we will prove that for a SAT instance which does have a satisfying assignment, the expected number of steps before the Markov chain finds one is at most n This is expected hitting time RC (7/8) Lecture slide To analyse the behaviour (both temporary and long-term) of a Markov chain M on the state space Ω, we often work from the graph representation of Ms transitions Definition (Definition 7) For a, b Ω, we say that state b is accessible from state a if there is some n N such that M n [a, b] > If a, b are both accessible from one another, we say that they communicate, and write a b Definition (Definition 7) A Markov chain M on the finite state space Ω is said to be irreducible if all states of Ω belong to the same communicating class Or equivalently, if we can show that for every X, Y Ω, there is some path Z = X, Z,, Z l = Y of states from Ω connecting X and Y, such that M[Z j, Z j+ ] > for every j l RC (7/8) Lecture slide

Let s consider our example chain: {,, } forms a maximal communicating class in the graph {} is an isolated communicating class - we can reach from {,, }, but has no outgoing transitions Our example is not irreducible 6 The expected time to travel" will be important for some analyses For a pair of given states Z, W Ω, we define this value as h Z,W = def t rz t,w Definition (Definition 75) A recurrent state Z of a Markov chain M is positive recurrent if h Z,Z < Otherwise, it is null recurrent t= Lemma (Lemma 75) In a Markov chain M on a finite state space Ω, At least one state is recurrent; All recurrent states are positive recurrent RC (7/8) Lecture slide RC (7/8) Lecture slide 5 We define the concept of recurrence" in terms of a parameter r t Z,W, where r t Z,W = def Pr[X(t) = W and for all s t, X(s) W X() = Z ] Definition (Definition 7) If M is a Markov chain and Z Ω a state of that chain, we say Z is recurrent if t= r Z t,z =, and it is transient if t= r Z t,z < A Markov chain is recurrent if every state is recurrent In our example, the only recurrent state is Definition (Definition 76) A state Z of a Markov chain M is periodic if there exists an integer k such that Pr[X(t + s) = Z X(t) = Z ] is non-zero if and only if s is divisible by k A discrete Markov chain is periodic is any of its states are periodic Otherwise (all states aperiodic), we say that M is aperiodic Definition (Definition 77) An aperiodic, positive recurrent state Z Ω of a Markov chain M is said to be an ergodic state The Markov chain is said to be ergodic if all its states are ergodic Corollary (Corollary 76) Any finite, irreducible and aperiodic Markov chain is an ergodic chain RC (7/8) Lecture slide RC (7/8) Lecture slide 6

Ergodic Markov chains Ergodic Markov chains Corollary (Corollary 76) Any finite, irreducible and aperiodic Markov chain is an ergodic chain Proof By Lemma 75 every chain with a finite state space has at least one recurrent state If the chain is irreducible, then all states can be reached from one another (with positive probability), so all states are hence recurrent By Lemma 75, this means all states are positive recurrent Hence by Definition 77 ( we have positive recurrence, irreducibility and aperiodicity) the chain is ergodic Take-away: For Markov chains over a finite state space, we only need to check aperiodicity and irreducibility (for all pairs of states) to show ergodicity Theorem (Theorem 77) Any finite, irreducible and aperiodic Markov chain has the following properties: The chain has a unique stationary distribution π = (π,, π Ω ) For all X, Y Ω, lim t M t [X, Y ] exists and is independent of X π Y = lim t M t [X, Y ] = h Y,Y We are not going to prove Theorem 77 in this course We will later see some ways of bounding the number of steps needed for the chain to approach its stationary distribution RC (7/8) Lecture slide 7 RC (7/8) Lecture slide 9 Ergodic Markov chains Why do we care about states/markov chains being ergodic? Because an ergodic Markov chain has a unique stationary distribution Definition (Definition 78) A stationary distribution of a Markov chain M on Ω is a probability distribution π on Ω such that π = π M Stationary distributions are essentially settling-down" distributions (in the limit) The settling-down" is to a distribution, not a particular state RC (7/8) Lecture slide 8 Reading and Doing Reading You will want to read Sections 7, 7, and 7 from the book I have used M (name of the Markov chain) to refer to the transition matrix, as opposed to the book s P Doing Consider an example of contingency tables where we have r = (,, ), c = (,, ) Suppose that we take the following state as our starting state X: X = Work out the subset of contingency tables which can be reached from X in one transition of the Markov chain Also work out the probability of each such transition Exercises 7 and 7 from the [MU] book RC (7/8) Lecture slide