ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Similar documents
MARKOV PROCESSES. Valerio Di Valerio

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Part I Stochastic variables and Markov chains

IEOR 6711, HMWK 5, Professor Sigman

Readings: Finish Section 5.2

Lecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking

Statistics 150: Spring 2007

Stochastic process. X, a series of random variables indexed by t

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Data analysis and stochastic modeling

Markov Chains Handout for Stat 110

LECTURE #6 BIRTH-DEATH PROCESS

Birth-Death Processes

Continuous-Time Markov Chain

Markov chains. 1 Discrete time Markov chains. c A. J. Ganesh, University of Bristol, 2015

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

CS 798: Homework Assignment 3 (Queueing Theory)

Lecture 20: Reversible Processes and Queues

1 Continuous-time chains, finite state space

The Transition Probability Function P ij (t)

1 IEOR 4701: Continuous-Time Markov Chains

Queuing Networks: Burke s Theorem, Kleinrock s Approximation, and Jackson s Theorem. Wade Trappe

Introduction to Queuing Networks Solutions to Problem Sheet 3

Advanced Computer Networks Lecture 2. Markov Processes

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

STA 624 Practice Exam 2 Applied Stochastic Processes Spring, 2008

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chain Model for ALOHA protocol

MATH37012 Week 10. Dr Jonathan Bagley. Semester

Markov processes and queueing networks

Markov Chains (Part 3)

Name of the Student:

Countable state discrete time Markov Chains

Classification of Countable State Markov Chains

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Disjointness and Additivity

Week 5: Markov chains Random access in communication networks Solutions

57:022 Principles of Design II Final Exam Solutions - Spring 1997

Queuing Theory. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011

An M/M/1 Queue in Random Environment with Disasters

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

Examples of Countable State Markov Chains Thursday, October 16, :12 PM

Performance Evaluation of Queuing Systems

Modelling Complex Queuing Situations with Markov Processes

Markov chains. Randomness and Computation. Markov chains. Markov processes

Random Walk on a Graph

Time Reversibility and Burke s Theorem

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

18.175: Lecture 30 Markov chains

Queueing Theory II. Summary. ! M/M/1 Output process. ! Networks of Queue! Method of Stages. ! General Distributions

Lectures on Markov Chains

LECTURE 3. Last time:

Markov Chains, Random Walks on Graphs, and the Laplacian

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

Statistics 433 Practice Final Exam: Cover Sheet and Marking Sheet

Birth-death chain models (countable state)

ISyE 6650 Probabilistic Models Fall 2007

At the boundary states, we take the same rules except we forbid leaving the state space, so,.

Budapest University of Tecnology and Economics. AndrásVetier Q U E U I N G. January 25, Supported by. Pro Renovanda Cultura Hunariae Alapítvány

Continuous time Markov chains

Structured Markov Chains

Probability and Statistics Concepts

TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis

Math 597/697: Solution 5

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

RECURSION EQUATION FOR

Contents Preface The Exponential Distribution and the Poisson Process Introduction to Renewal Theory

Sample Spaces, Random Variables

Input-queued switches: Scheduling algorithms for a crossbar switch. EE 384X Packet Switch Architectures 1

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

18.440: Lecture 33 Markov Chains

Cover Page. The handle holds various files of this Leiden University dissertation

Stochastic Models in Computer Science A Tutorial

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical

Lecture 20 : Markov Chains

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

18.600: Lecture 32 Markov Chains

Mathematical Methods for Computer Science

Markov Processes Hamid R. Rabiee

Engineering Mathematics : Probability & Queueing Theory SUBJECT CODE : MA 2262 X find the minimum value of c.

Stability of the Maximum Size Matching

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18.

THE QUEEN S UNIVERSITY OF BELFAST

1 Gambler s Ruin Problem

Interlude: Practice Final

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Jitter Analysis of an MMPP 2 Tagged Stream in the presence of an MMPP 2 Background Stream

Continuous-Valued Probability Review

A review of Continuous Time MC STA 624, Spring 2015

Page 0 of 5 Final Examination Name. Closed book. 120 minutes. Cover page plus five pages of exam.

Non Markovian Queues (contd.)

Essentials on the Analysis of Randomized Algorithms

Stochastic Processes

Transcription:

ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains September 1, 215 Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer Science The University of Tennessee Fall 215 1

Simple DTMCs 1-p p q 1 1-q d c e a 2 b 1 f States can be labeled (,)1,2,3, At every time slot a jump decision is made randomly based on the current state (Sometimes the arrow pointing back to the same state is omitted) ECE 517 Reinforcement Learning in AI 2

1-D Random Walk 1-p p Time is slotted X(t) The walker flips a coin every time slot to decide which way to go ECE 517 Reinforcement Learning in AI 3

Single Server Queue Bernoulli(p) Geom(q) Consider a queue at a supermarket In every time slot: A customer arrives with probability p The HoL customer leaves with probability q We d like to learn about the behavior of such a system ECE 517 Reinforcement Learning in AI 4

Birth-Death Chain 1 2 3 Our queue can be modeled by a Birth-Death Chain (a.k.a. Geom/Geom/1 queue) Want to know: Queue size distribution Average waiting time, etc. Markov Property The Future is independent of the Past given the Present In other words: Memoryless We ve mentioned memoryless distributions: Exponential and Geometric Useful for modeling and analyzing real systems ECE 517 Reinforcement Learning in AI 5

Discrete Time Random Process (DTRP) Random Process: An indexed family of random variables Let X n be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random variables with common cdf F X (x). This sequence is called the i.i.d. random process. Example: Sequence of Bernoulli trials (flip of coin) In networking: traffic may obey a Bernoulli i.i.d. arrival Pattern In reality, some degree of dependency/correlation exists between consecutive elements in a DTRP Example: Correlated packet arrivals (video/audio stream) ECE 517 Reinforcement Learning in AI 6

Discrete Time Markov Chains A sequence of random variables {X n } is called a Markov Chain if it has the Markov property: States are usually labeled {(,)1,2, } State space can be finite or infinite Transition Probability Matrix Probability of transitioning from state i to state j We will assume the MC is homogeneous/stationary: independent of time Transition probability matrix: P = {p ij } Two state MC: ECE 517 Reinforcement Learning in AI 7

Stationary Distribution Define then p k+1 = p k P (p is a row vector) Stationary distribution: if the limit exists If p exists, we can solve it by These are called balance equations Transitions in and out of state i are balanced ECE 517 Reinforcement Learning in AI 8

General Comment & Conditions for p to Exist (I) If we partition all the states into two sets, then transitions between the two sets must be balanced. This can be easily derived from the balance equations Definitions: State j is reachable by state i if State i and j commute if they are reachable by each other The Markov chain is irreducible if all states commute ECE 517 Reinforcement Learning in AI 9

Conditions for p to Exist (I) (cont d) Condition: The Markov chain is irreducible Counter-examples: p=1 1 2 3 4 1 2 3 Aperiodic Markov chain Counter-example: 1 1 1 1 2 ECE 517 Reinforcement Learning in AI 1

Conditions for p to Exist (II) For the Markov chain to be recurrent All states i must be recurrent, i.e. Otherwise transient With regards to a recurrent MC State i is recurrent if E(T i )<1, where T i is time between visits to state i Otherwise the state is considered null-recurrent ECE 517 Reinforcement Learning in AI 11

Solving for p: Example for two-state Markov Chain p 1-p 1 1-q q p ECE 517 Reinforcement Learning in AI 12

Birth-Death Chain 1-u 1-u-d 1-u-d 1-u-d u 1 u 2 u 3 u d d d d Arrival w.p. p ; departure w.p. q Let u = p(1-q), d = q(1-p), r = u/d Balance equations: ECE 517 Reinforcement Learning in AI 13

Birth-Death Chain (cont d) Continuing this pattern, we observe that: p(i-1)u = p(i)d Equivalently, we can draw a bi-section between state i and state i-1 Therefore, we have where r = u/d. What we are interested in is the stationary distribution of the states, so ECE 517 Reinforcement Learning in AI 14

Birth-Death Chain (cont d) ECE 517 Reinforcement Learning in AI 15