Discrete Markov Chain. Theory and use

Discrete Markov Chain. Theory and use Andres Vallone PhD Student andres.vallone@predoc.uam.es 2016

Contents 1 Introduction 2 Concept and definition Examples Transitions Matrix Chains Classification 3 Empirical Uses Introduction Test the Markov Property Estimating the Transition Matrix Example

Introduction

Introduction Introduction What is the probability of have a nice day tomorrow if today is raining? What is the probability of buy a Coke when you first buy a Pepsi? What is the probability of a state move out of poverty? How was the evolution of the income distribution in a country?

What is a DMC? Concept and definition The Markov Chain is a discrete-time stochastic process Describes a system whose states change over time Changes are governed by a probability distribution Is a memorylessness process: the next state only depends upon the current system state,the path to the present state is not relevant In order to make a formal definition suppose that exists: Set of independent state S = {s 1,..., s k } A sequence of random variables X 0, X 1, X 2,... with values in S that describes the state of the system in time t A set of transition rules specifying the probability of move from the state i in time t to the state j in time t + 1 If the rules are independent of time, the DMC is homogeneous.

Discrete Markov Chain Concept and definition Definition A random process X 0, X 1, X 2,... with finite state space S = {s 1,..., s k } is said to be a Markov homogeneous chain if i, j {i,..., k} and i 0,..., i n 1 {1,..., k} P [ X t+1 = j X 0 = i 0, X 1 = i 1,..., X t 1 = in 1, X t = i ] = = P [ X t+1 = j X t = i ] = p i,j p i,j is a conditional probability: defines the probability that the chain jumps to state j, at time t + 1, given that it is in state i at time t, p i,j is a rule of movement, since S has size k, exist k k rules of movement. The set of rules specify the transition matrix.

The Transition Matrix Concept and definition p 11 p 12... p 1k p 21 p 22... p 2k P =...... p k1 p k2... p kk Each p ij denote the probability of one step in time Are probabilities p i,j 0 All state are exclusive k p ij = 1 i=1 In case of an homogeneous DMC p ij = p i,j t Homogeneity implies that every time the chain is in state s, the probability of jumping to another state is the same.

Example 1 Examples According to Kemeny, Snell, and Thompson 1, Land of Oz is blessed by many things, but not by good weather. They never have two nice days in a row. If they have a nice day, they are just as likely to have snow as rain the next day. If they have snow or rain, they have an even chance of having the same the next day. If there is change from snow or rain, only half of the time is this a change to a nice day What are the transition matrix in this case? 1 J. G. Kemeny, J. L. Snell, G. L. Thompson, Introduction to Finite Mathematics, 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall, 1974)

Example 1 Examples Rain 0.25 0.25 0.25 Transition Matrix 1/2 1/4 1/4 P = 1/2 0 1/2 1/4 1/4 1/2 Nice Snow 0.25

Example 2 Examples Coke or Pepsi? Given that a person s last cola purchase was Coke, there is a 90% chance that his next cola purchase will also be Coke. If a person s last cola purchase was Pepsi, there is an 80% chance that his next cola purchase will also be Pepsi. What are the transition matrix in this case?

Example 2 Examples Pepsi 0.8 0.2 0.1 Transition Matrix [ ] 0.9 0.1 P = 0.2 0.8 Coke 0.9

The Transition Matrix Transitions Matrix Consider the example 1, what is the probability of have an snowy day in two days, if today is raining? Have a snowy day in two days denote as p (2) 13 is the disjoint union of: (a) it is rainy tomorrow and snowy two days from now (b) it is nice tomorrow and snowy two days from now (c) it is snowy tomorrow and snowy two days from now p (2) 13 = p 11p 13 + p 12 p 21 + p 13 p 33 This is just what is done in obtaining the 1,3-entry of the product of P with itself. Day 1 1/2 1/4 1/4 P = 1/2 0 1/2 1/4 1/4 1/2 Day 2 1/2 1/4 1/4 P = 1/2 0 1/2 1/4 1/4 1/2

The Transition Matrix Transitions Matrix Proposition (Chapman-Kolgomorov equation) For any n > 0,m > 0, i S, j S p m+n ij = k S p n i,kp m jk

The Transition Matrix Transitions Matrix Proposition (Chapman-Kolgomorov equation) For any n > 0,m > 0, i S, j S p m+n ij = k S p n i,kp m jk Theorem Let P be the transition matrix of a Markov chain. The ij-th entry of the matrix P n gives the probability that the Markov chain, starting in state s i, will be in state s j after n steps.

Long therm behaviour Transitions Matrix Let π 0 a vector k 1 defining for all state i the probability that the Markov chain is initially at i π 0 (i) = P [X 0 = i] : i = 1,..., k Consider a vector π n (j) k 1 that defines the probabilities that the Markov chain is at state j after n step π n (j) = { π n (1),..., π n (k) } Theorem Let P be the transition matrix of a Markov chain, and let π 0 be the probability vector which represents the starting distribution. Then the probability that the chain is in state s i after n steps is the ith entry of π n = P n π 0

Chains Classification Periodic State Chains Classification 1 1 2 3 4 5 1 0 1 0 0 0 0.25 0 0 0.25 0 0 0 P = 0 0 0 P 2n 0 0 0 0 = 0.25 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0.25 0 0 0.25 0 0 0 P 2n+1 0.25 0 0 0.25 This means at every = 0 0 0 state i, if n is an even 0.25 0 0 0.25 number P n (i, i) > 0 0 0 0

Chains Classification Absorbing State Chains Classification 1 2 3 4 5 1 1 1 0 0 0 0 0 0 0 P = 0 0 0 0 0 0 0 0 0 0 1 rows are not identical, i.e, the chain does not forget the initial state

Chains Classification Absorbing State Chains Classification 1 2 3 4 5 1 1 1 0 0 0 0 P n 0.75 0 0 0 0.25 = 0 0 0 0.25 0 0 0 0.75 0 0 0 0 1 states 2, 3, 4 are transient: p n i2 = p n i3 = pn i4 = 0 i.e. after passing a large amount of time the chain will stop visiting this states.

Chains Classification Reducible Chains Chains Classification 0.1 0.2 0.7 1 2 3 4 5 0.2 0.8 0.9 0.3 0.7 0.6 0.2 0.3 0.2 0.8 0 0. 0 0.1 0.9 0 0 0 P = 0 0 0.3 0.7 0 0 0 0.2 0.6 0.2 0 0 0 0.7 0.3 0.1 0.9 0 0 0 P n 0.1 0.9 0 0 0 = 0 0 0.2 0.6 0 0 0.2 0.6 0.2 0 0 0.2 0.6 0.2

Chains Classification Ergodic Chains Chains Classification State j is reachable from state i if probability to go from i to j in n 0 steps is greater then 0 A subset X of the state space S is closed if p ij = 0, i S and j S A closed set of states is irreducible if any state j S is reachable from any state i S A Markov Chain is irreducible if the state space S is irreducible

Chains Classification Irreducible Chains Chains Classification Ergodic p 21 p 32 1 2 3 p 11 p 12 p 23 p 33 Reducible p 21 p 43 1 2 3 4 p 11 p12 p 23 p 33 p34 p 44 p 25 5

Steady State chain Chains Classification Theorem (Basic Limit Theorem) Any ergodic, aperiodic, Markov chain defined on a set of state S and with a stochastic transition matrix P has a unique stationary distribution π with all its component positive. Furthermore, let P n be the n-th power of P, then lim π (j) P n = π n

Empirical Uses

Introduction Empirical Uses Introduction In social science is not common work with discrete state variables. Wrong discretizaton produce wrong result by (cite) In necessary test if the random variables follow a Markov process Is necessary find the Transition matrix

Empirical Uses Test the Markov Property Bickenbach, F., & Bode, E. (2003). Test the Markov Property First, homogeneity over time (time-stationarity) can be checked by dividing the entire sample into T periods, and testing whether or not the transition matrices estimated from each of the T subsamples differ significantly from the matrix estimated from the entire sample. H 0 : t p ij (t) = p ij t = 1,..., T H 1 : t p ij (t) ij ) Q (T ) = T 2 ( ) N (ˆpij (t) ˆp ij N n i (t) asy χ 2 (a i 1) (b 1 1) t=1 i=1 j B i ˆp ij i=1

Empirical Uses Test the Markov Property Bickenbach, F., & Bode, E. (2003). Test the Markov Property ) Q (T ) = T 2 ( ) N (ˆpij (t) ˆp ij N n i (t) asy χ 2 (a i 1) (b 1 1) t=1 i=1 j B i ˆp ij i=1 ˆp ij denotes the probability of transition from the i-th to the j-th class estimated from the entire sample ˆp ij (t) the corresponding transition probability estimated from the t-th sub-sample ˆp ij (t) = n ij (t) /n ij n i (t) denotes the absolute number of observations initially falling into the i-th class within the t-th sub-sample. b i = B i is the number of positive entries in the i-th row of the matrix for the entire sample B i = { j : ˆp ij } a 1 = A i is the number of sub-samples (t) in which observations for the i-th row are available A i = { t : n (t) > 0 }

Empirical Uses Estimating the Transition Matrix Estimating the Transition Matrix We need to estimate all the p ij parameters of the matrix k k of probabilities. p ij = P [ X t+1 = j X t = i ] We have a sample from the chain x n x 1,..., x n that is a realization of the Markov chain X1 n n P [X1 n = x n 1 ] = P [ ] X t = x t X t 1 = x t 1 t=1 Re-written in term of transitions probability p ij n L (p) = p xt 1,x t t=2 Defining n ij as the number of times i is followed by j in X1 n k k L (p) = i=1 j=1 p n ij ij

Empirical Uses Estimating the Transition Matrix Estimating the Transition Matrix L (p) = k k p n ij ij i=1 j=1 Taking log and remembering that j p ij = 1 to find p ij is necessary solve: Max p ij l (p) = i,j j n ij log p ij λ i i=1 j The first order conditions of this problem are: p ij 1 l p ij = n ij p ij λ i 0 (1) l λ i = j p ij 1 0 (2)

Empirical Uses Estimating the Transition Matrix From (1) Estimating the Transition Matrix p ij = n ij λ i (3) knowing that we have k lagrange multipliers and using (2) and (3) k j=1 k j=1 n ij λ i 1 n ij = λ i (4) With (3) and (4) ˆp ij = n ij k n ij j=1

Example Empirical Uses Example EN proceso...