ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains September 1, 215 Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer Science The University of Tennessee Fall 215 1

Simple DTMCs 1-p p q 1 1-q d c e a 2 b 1 f States can be labeled (,)1,2,3, At every time slot a jump decision is made randomly based on the current state (Sometimes the arrow pointing back to the same state is omitted) ECE 517 Reinforcement Learning in AI 2

1-D Random Walk 1-p p Time is slotted X(t) The walker flips a coin every time slot to decide which way to go ECE 517 Reinforcement Learning in AI 3

Single Server Queue Bernoulli(p) Geom(q) Consider a queue at a supermarket In every time slot: A customer arrives with probability p The HoL customer leaves with probability q We d like to learn about the behavior of such a system ECE 517 Reinforcement Learning in AI 4

Birth-Death Chain 1 2 3 Our queue can be modeled by a Birth-Death Chain (a.k.a. Geom/Geom/1 queue) Want to know: Queue size distribution Average waiting time, etc. Markov Property The Future is independent of the Past given the Present In other words: Memoryless We ve mentioned memoryless distributions: Exponential and Geometric Useful for modeling and analyzing real systems ECE 517 Reinforcement Learning in AI 5

Discrete Time Random Process (DTRP) Random Process: An indexed family of random variables Let X n be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random variables with common cdf F X (x). This sequence is called the i.i.d. random process. Example: Sequence of Bernoulli trials (flip of coin) In networking: traffic may obey a Bernoulli i.i.d. arrival Pattern In reality, some degree of dependency/correlation exists between consecutive elements in a DTRP Example: Correlated packet arrivals (video/audio stream) ECE 517 Reinforcement Learning in AI 6

Discrete Time Markov Chains A sequence of random variables {X n } is called a Markov Chain if it has the Markov property: States are usually labeled {(,)1,2, } State space can be finite or infinite Transition Probability Matrix Probability of transitioning from state i to state j We will assume the MC is homogeneous/stationary: independent of time Transition probability matrix: P = {p ij } Two state MC: ECE 517 Reinforcement Learning in AI 7

Stationary Distribution Define then p k+1 = p k P (p is a row vector) Stationary distribution: if the limit exists If p exists, we can solve it by These are called balance equations Transitions in and out of state i are balanced ECE 517 Reinforcement Learning in AI 8

General Comment & Conditions for p to Exist (I) If we partition all the states into two sets, then transitions between the two sets must be balanced. This can be easily derived from the balance equations Definitions: State j is reachable by state i if State i and j commute if they are reachable by each other The Markov chain is irreducible if all states commute ECE 517 Reinforcement Learning in AI 9

Conditions for p to Exist (I) (cont d) Condition: The Markov chain is irreducible Counter-examples: p=1 1 2 3 4 1 2 3 Aperiodic Markov chain Counter-example: 1 1 1 1 2 ECE 517 Reinforcement Learning in AI 1

Conditions for p to Exist (II) For the Markov chain to be recurrent All states i must be recurrent, i.e. Otherwise transient With regards to a recurrent MC State i is recurrent if E(T i )<1, where T i is time between visits to state i Otherwise the state is considered null-recurrent ECE 517 Reinforcement Learning in AI 11

Solving for p: Example for two-state Markov Chain p 1-p 1 1-q q p ECE 517 Reinforcement Learning in AI 12

Birth-Death Chain 1-u 1-u-d 1-u-d 1-u-d u 1 u 2 u 3 u d d d d Arrival w.p. p ; departure w.p. q Let u = p(1-q), d = q(1-p), r = u/d Balance equations: ECE 517 Reinforcement Learning in AI 13

Birth-Death Chain (cont d) Continuing this pattern, we observe that: p(i-1)u = p(i)d Equivalently, we can draw a bi-section between state i and state i-1 Therefore, we have where r = u/d. What we are interested in is the stationary distribution of the states, so ECE 517 Reinforcement Learning in AI 14

Birth-Death Chain (cont d) ECE 517 Reinforcement Learning in AI 15