Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6)

Markov chains and the number of occurrences of a word in a sequence (4.5 4.9,.,2,4,6) Prof. Tesler Math 283 Fall 208 Prof. Tesler Markov Chains Math 283 / Fall 208 / 44

Locating overlapping occurrences of a word Consider a (long) single-stranded nucleotide sequence τ = τ... τ N and a (short) word w = w... w k, e.g., w = AA. for i = to N-3 { if (τ i τ i+ τ i+2 τ i+3 == AA) {... } } The above scan takes up to 4N comparisons to locate all occurrences of AA (kn comparisons for w of length k). A finite state automaton (FSA) is a machine that can locate all occurrences while only examining each letter of τ once. Prof. Tesler Markov Chains Math 283 / Fall 208 2 / 44

Overlapping occurrences of AA A,C,T M 0 C,T A,C,T A C,T A A,C,T A A AA The states are the nodes,, A, A, AA (prefixes of w). For w = w w 2 w k, there are k + states (one for each prefix). Start in the state (shown on figure as 0). Scan τ = τ τ 2... τ N one character at a time left to right. Transition edges: When examining τ j, move from the current state to the next state according to which edge τ j is on. For each node u = w w r and each letter x = A, C,, T, determine the longest suffix s (possibly ) of w w r x that s among the states. Draw an edge u x s The number of times we are in the state AA is the desired count of number of occurrences. Prof. Tesler Markov Chains Math 283 / Fall 208 3 / 44

Overlapping occurrences of AA in τ = CAATCAAT... A,C,T 0 C,T A,C,T A C,T M A A,C,T A A AA t 2 3 4 5 6 7 8 9 0 2 3 4 5 τ t C A A T C A A T... Time t State at t τ t 0 C 2 0 A 3 0 4 A 5 A 6 A 7 T 8 0 C Time t State at t τ t 9 0 0 A A 2 A A 3 AA 4 A T 5 0 Prof. Tesler Markov Chains Math 283 / Fall 208 4 / 44

Non-overlapping occurrences of AA M = A,C,T 0 C,T A,C,T A C,T A A,C,T A A AA M2 = A,C,T 0 C,T A,C,T A A A A AA C,T A,C,T For non-overlapping occurrences of w: Replace the outgoing edges from w by copies of the outgoing edges from. On previous slide, the time 3 4 transition AA A changes to AA. Prof. Tesler Markov Chains Math 283 / Fall 208 5 / 44

Motif {AA, TA}, overlaps permitted C A,C,T 0 C C T A,C,T A A A T A A,C,T AA A,C,T T T T A A,C,T TA States: All prefixes of all words in the motif. If a prefix occurs multiple times, only create one node for it. Transition edges: they may jump from one word of the motif to another. TA A. Count the number of times we reach the states for any words in the motif (AA or TA). Prof. Tesler Markov Chains Math 283 / Fall 208 6 / 44

Markov chains A Markov chain is similar to a finite state machine, but incorporates probabilities. Let S be a set of states. We will take S to be a discrete finite set, such as S = {, 2,..., s}. Let t =, 2,... denote the time. Let X, X 2,... denote a sequence of random variables, values S. The X t s form a (first order) Markov chain if they obey these rules The probability of being in a certain state at time t + only depends on the state at time t, not on any earlier states: P(X t+ = x t+ X = x,..., X t = x t ) = P(X t+ = x t+ X t = x t ) 2 The probability of transitioning from state i at time t to state j at time t + only depends on i and j, but not on the time t: P(X t+ = j X t = i) = p ij at all times t for some values p ij, which form an s s transition matrix. Prof. Tesler Markov Chains Math 283 / Fall 208 7 / 44

Transition matrix The transition matrix, P, of the Markov chain M is From state To state 2 3 4 5 : 0 p A + p C + p T p 0 0 0 2: p C + p T p p A 0 0 3: A p A + p C + p T 0 0 p 0 4: A p C + p T p 0 0 p A = 5: AA p A + p C + p T 0 0 p 0 P P 2 P 3 P 4 P 5 P 2 P 22 P 23 P 24 P 25 P 3 P 32 P 33 P 34 P 35 P 4 P 42 P 43 P 44 P 45 P 5 P 52 P 53 P 54 P 55 Notice that the entries in each row sum up to p A + p C + p + p T =. A matrix with all entries 0 and all row sums equal to is called a stochastic matrix. The transition matrix of a Markov chain is always stochastic. All row sums = can be written P = where = so is a right eigenvector of P with eigenvalue. Prof. Tesler Markov Chains Math 283 / Fall 208 8 / 44.

Transition matrices for AA pa+pc+pt 0 p pc+pt p pa+pc+pt pa pc+pt M A p pa+pc+pt p A pa p AA P 3/4 /4 0 0 0 /2 /4 /4 0 0 3/4 0 0 /4 0 /2 /4 0 0 /4 3/4 0 0 /4 0 pa+pc+pt 0 p pc+pt p pa+pc+pt pa pc+pt M2 A p pa+pc+pt p p A pa AA P2 3/4 /4 0 0 0 /2 /4 /4 0 0 3/4 0 0 /4 0 /2 /4 0 0 /4 3/4 /4 0 0 0 Edge labels are replaced by probabilities, e.g., p C + p T. The matrices are shown for the case that all nucleotides have equal probabilities /4. P2 (no overlaps) is obtained from P (overlaps allowed) by replacing the last row with a copy of the first row. Prof. Tesler Markov Chains Math 283 / Fall 208 9 / 44

Other applications of automata Automata / state machines are also used in other applications in Math and Computer Science. The transition weights may be defined differently, and the matrices usually aren t stochastic. Combinatorics: Count walks through the automaton (instead of getting their probabilities) by setting transition weights u x s to. Computer Science (formal languages, classifiers,... ): Does the string τ contain AA? Output if it does, 0 otherwise. Modify M: remove the outgoing edges on AA. On reaching state AA, terminate with output. If the end of τ is reached, terminate with output 0. This is called a deterministic finite acceptor (DFA). Markov chains: Instead of considering a specific string τ, we ll compute probabilities, expected values,... over the sample space of all strings of length n. Prof. Tesler Markov Chains Math 283 / Fall 208 0 / 44

Other Markov chain examples A Markov chain is kth order if the probability of X t = i depends on the values of X t,..., X t k. It can be converted to a first order Markov chain by making new states that record more history. Positional independence: Instead of a null hypothesis that a DNA sequence is generated by repeated rolls of a biased four-sided die, we could use a Markov chain. The simplest is a one-step transition matrix p AA p AC p A p AT p P = CA p CC p C p CT p A p C p p T p TA p TC p T p TT P could be the same at all positions. In a coding region, it could be different for the first, second, and third positions of codons. Nucleotide evolution: There are models of random point mutations over the course of evolution concerning Markov chains with the form P (same as above) in which X t is the state A, C,, T of the nucleotide at a given position in a sequence at time (generation) t. Prof. Tesler Markov Chains Math 283 / Fall 208 / 44

Questions about Markov chains What is the probability of being in a particular state after n steps? 2 What is the probability of being in a particular state as n? 3 What is the reverse Markov chain? 4 If you are in state i, what is the expected number of time steps until the next time you are in state j? What is the variance of this? What is the complete probability distribution? 5 Starting in state i, what is the expected number of visits to state j before reaching state k? Prof. Tesler Markov Chains Math 283 / Fall 208 2 / 44

Transition probabilities after two steps Time t t + t + 2 P i P j P i2 2 P 2j i P i3 3 P 3j j P i4 P 4j P i5 4 P 5j To compute the probability for going from state i at time t to state j at time t + 2, consider all the states it could go through at time t + : 5 P(X t+2 = j X t = i) = r P(X t+ = r X t = i)p(x t+2 = j X t+ = r, X t = i) = r P(X t+ = r X t = i)p(x t+2 = j X t+ = r) = r P irp rj = (P 2 ) ij Prof. Tesler Markov Chains Math 283 / Fall 208 3 / 44

Transition probabilities after n steps For n 0, the transition matrix from time t to time t + n is P n : P(X t+n = j X t = i) = P(X t+ = r X t = i)p(x t+2 = r 2 X t+ = r ) r,...,r n = P i r P r r 2 P rn j = (P n ) ij r,...,r n (sum over possible states r,..., r n at times t +,..., t + (n )) Prof. Tesler Markov Chains Math 283 / Fall 208 4 / 44

State probability vector α i (t) = P(X t = i) is the probability of being in state i at time t. α (t) Column vector α(t) =. α s (t) or transpose it to get a row vector α(t) = (α (t),..., α s (t)) The probabilities at time t + n are α j (t + n) = P(X t+n = j α(t)) = i P(X t+n = j X t = i)p(x t = i) = i α i(t)(p n ) ij = ( α(t) P n ) j so α(t + n) = α(t) P n (row vector times matrix) or equivalently, (P ) n α(t) = α(t + n) (matrix times column vector). Prof. Tesler Markov Chains Math 283 / Fall 208 5 / 44

State vector after n steps for AA; P = P P = 3 4 4 0 0 0 2 4 4 0 0 3 4 0 0 4 0 2 4 0 0 4 3 4 0 0 4 0 P 2 = 6 6 6 6 6 4 3 6 6 0 0 6 6 0 4 0 0 3 6 6 6 6 0 4 0 0 6 (P ) 2 = At t = 0, suppose 3 chance of being in the st state; 2 3 being in the 2 nd state; and no chance of other states: α(0) = ( 3, 2 3, 0, 0, 0). Time t = 2 is n = 2 0 = 2 steps later: α(2) = ( 3, 2 3, 0, 0, 0)P2 = ( 6, 5 24, 6, 24, 0) Alternately: α(0) = /3 2/3 0 0 0 α(2) = (P ) 2 α(0) = 6 4 6 6 3 6 6 4 6 3 6 6 4 6 0 6 0 0 6 0 6 0 0 0 6 0 6 /6 5/24 /6 /24 0 chance of Prof. Tesler Markov Chains Math 283 / Fall 208 6 / 44

Transition probabilities after n steps for AA; P = P P 0 = I = P 3 = 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 64 5 64 5 64 5 64 5 64 6 3 64 6 3 64 6 64 0 64 64 64 0 64 64 64 0 P = P 4 = 3 4 4 0 0 0 2 4 4 0 0 3 4 0 0 4 0 2 4 0 0 4 3 4 0 0 4 0 6 6 6 6 6 5 64 5 64 5 64 5 64 5 64 5 256 5 256 5 256 5 256 5 256 64 64 64 64 64 P 2 = 256 256 256 256 256 6 6 6 6 6 6 0 0 6 6 0 4 0 0 6 3 4 3 6 6 6 6 0 4 0 0 6 P n = P 4 for n 5 Regardless of the starting state, the probabilities of being in states,, 5 at time t (when t is large enough) are 6, 5 64, 5 256, 64, 256. Usually P n just approaches a limit asymptotically as n increases, rather than reaching it. We ll see other examples later (like P2). Prof. Tesler Markov Chains Math 283 / Fall 208 7 / 44

Matrix powers in Matlab and R Matlab >> P = [ [ 3/4, /4, 0, 0, 0 ]; % [ 2/4, /4, /4, 0, 0 ]; % [ 3/4, 0, 0, /4, 0 ]; % A [ 2/4, /4, 0, 0, /4 ]; % A [ 3/4, 0, 0, /4, 0 ]; % AA ] P = 0.7500 0.2500 0 0 0 0.5000 0.2500 0.2500 0 0 0.7500 0 0 0.2500 0 0.5000 0.2500 0 0 0.2500 0.7500 0 0 0.2500 0 >> P * P % or P^2 ans = 0.6875 0.2500 0.0625 0 0 0.6875 0.875 0.0625 0.0625 0 0.6875 0.2500 0 0 0.0625 0.6875 0.875 0.0625 0.0625 0 0.6875 0.2500 0 0 0.0625 R > P = rbind( + c(3/4,/4, 0, 0, 0), # + c(2/4,/4,/4, 0, 0), # + c(3/4, 0, 0,/4, 0), # A + c(2/4,/4, 0, 0,/4), # A + c(3/4, 0, 0,/4, 0) # AA + ) > P [,] [,2] [,3] [,4] [,5] [,] 0.75 0.25 0.00 0.00 0.00 [2,] 0.50 0.25 0.25 0.00 0.00 [3,] 0.75 0.00 0.00 0.25 0.00 [4,] 0.50 0.25 0.00 0.00 0.25 [5,] 0.75 0.00 0.00 0.25 0.00 > P %*% P [,] [,2] [,3] [,4] [,5] [,] 0.6875 0.2500 0.0625 0.0000 0.0000 [2,] 0.6875 0.875 0.0625 0.0625 0.0000 [3,] 0.6875 0.2500 0.0000 0.0000 0.0625 [4,] 0.6875 0.875 0.0625 0.0625 0.0000 [5,] 0.6875 0.2500 0.0000 0.0000 0.0625 Note: R doesn t have a built-in matrix power function. The > and + symbols above are prompts, not something you enter. Prof. Tesler Markov Chains Math 283 / Fall 208 8 / 44

Stationary distribution, a.k.a. steady state distribution If P is irreducible and aperiodic (these will be defined soon) then P n approaches a limit with this format as n : ] lim n Pn = [ ϕ ϕ 2 ϕ s ϕ ϕ 2 ϕ s ϕ ϕ 2 ϕ s In other words, no matter what the starting state, the probability of being in state j after n steps approaches ϕ j. The row vector ϕ = (ϕ,..., ϕ s ) is called the stationary distribution of the Markov chain. It is stationary because these probabilities stay the same from one time to the next; in matrix notation, ϕ P = ϕ, or P ϕ = ϕ. So ϕ is a left eigenvector of P with eigenvalue. Since it represents probabilities of being in each state, the components of ϕ add up to. Prof. Tesler Markov Chains Math 283 / Fall 208 9 / 44

Stationary distribution computing it for example M pa+pc+pt 0 p pc+pt p pa+pc+pt pa pc+pt M A p pa+pc+pt p A pa p AA Solve ϕ P = ϕ, or (ϕ,..., ϕ 5 )P = (ϕ,..., ϕ 5 ): ϕ = 3 4 ϕ + 2 ϕ 2 + 3 4 ϕ 3 + 2 ϕ 4 + 3 4 ϕ 5 ϕ 2 = 4 ϕ + 4 ϕ 2 + 0ϕ 3 + 4 ϕ 4 + 0ϕ 5 ϕ 3 = 0ϕ + 4 ϕ 2 + 0ϕ 3 + 0ϕ 4 + 0ϕ 5 ϕ 4 = 0ϕ + 0ϕ 2 + 4 ϕ 3 + 0ϕ 4 + 4 ϕ 5 ϕ 5 = 0ϕ + 0ϕ 2 + 0ϕ 3 + 4 ϕ 4 + 0ϕ 5 P 3/4 /4 0 0 0 /2 /4 /4 0 0 3/4 0 0 /4 0 /2 /4 0 0 /4 3/4 0 0 /4 0 and the total probability equation ϕ + ϕ 2 + ϕ 3 + ϕ 4 + ϕ 5 =. This is 6 equations in 5 unknowns, so it is overdetermined. Actually, the first 5 equations are underdetermined; they add up to ϕ + + ϕ 5 = ϕ + + ϕ 5. Knock out the ϕ 5 = equation and solve the rest of them to get ϕ = ( 6, 5 64, 5 256, 64, 256 ) (0.6875, 0.2344, 0.0586, 0.056, 0.0039). Prof. Tesler Markov Chains Math 283 / Fall 208 20 / 44

Solving equations in Matlab or R (this method doesn t use the functions for eigenvectors) Matlab >> eye(5) # identity 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> P - eye(5) # transpose minus identity -0.2500 0.5000 0.7500 0.5000 0.7500 0.2500-0.7500 0 0.2500 0 0 0.2500 -.0000 0 0 0 0 0.2500 -.0000 0.2500 0 0 0 0.2500 -.0000 >> [P - eye(5) ; ] -0.2500 0.5000 0.7500 0.5000 0.7500 0.2500-0.7500 0 0.2500 0 0 0.2500 -.0000 0 0 0 0 0.2500 -.0000 0.2500 0 0 0 0.2500 -.0000.0000.0000.0000.0000.0000 >> sstate=[p -eye(5); ] \ [0 0 0 0 0 ] sstate = 0.6875 0.2344 0.0586 0.056 0.0039 R > diag(,5) % identity [,] [,2] [,3] [,4] [,5] [,] 0 0 0 0 [2,] 0 0 0 0 [3,] 0 0 0 0 [4,] 0 0 0 0 [5,] 0 0 0 0 > t(p) - diag(,5) % transpose minus identity [,] [,2] [,3] [,4] [,5] [,] -0.25 0.50 0.75 0.50 0.75 [2,] 0.25-0.75 0.00 0.25 0.00 [3,] 0.00 0.25 -.00 0.00 0.00 [4,] 0.00 0.00 0.25 -.00 0.25 [5,] 0.00 0.00 0.00 0.25 -.00 > rbind(t(p) - diag(,5), c(,,,,)) [,] [,2] [,3] [,4] [,5] [,] -0.25 0.50 0.75 0.50 0.75 [2,] 0.25-0.75 0.00 0.25 0.00 [3,] 0.00 0.25 -.00 0.00 0.00 [4,] 0.00 0.00 0.25 -.00 0.25 [5,] 0.00 0.00 0.00 0.25 -.00 [6,].00.00.00.00.00 > sstate = qr.solve(rbind(t(p) - diag(,5), + c(,,,,)),c(0,0,0,0,0,)) > sstate [] 0.68750000 0.23437500 0.05859375 0.0562500 [5] 0.00390625 Prof. Tesler Markov Chains Math 283 / Fall 208 2 / 44

Eigenvalues of P A transition matrix is stochastic: all entries are 0 and its row [ ] sums are all. So.. P = where = Thus, λ = is an eigenvalue of P and is a right eigenvector. There is also a left eigenvector of P with eigenvalue : w P = w where w is a row vector. Normalize it so its entries add up to, to get the stationary distribution ϕ. All eigenvalues λ of a stochastic matrix have λ. An irreducible aperiodic Markov chain has just one eigenvalue =. The 2 nd largest λ determines how fast P n converges. For example, if it s diagonalizable, the spectral decomposition is: P n = n M + λ 2 n M 2 + λ 3 n M 3 + but there may be complications (periodic Markov chains, complex eigenvalues,... ). Prof. Tesler Markov Chains Math 283 / Fall 208 22 / 44.

Technicalities reducibility 2 4 7 8 9 3 5 6 0 A Markov chain is irreducible if every state can be reached from every other state after enough steps. The above example is reducible since there are states that cannot be reached from each other: after sufficient time, you are either stuck in state 3, the component {4, 5, 6, 7}, or the component {8, 9, 0}. Prof. Tesler Markov Chains Math 283 / Fall 208 23 / 44

Technicalities period 2 4 7 8 9 3 5 6 0 State i has period d if the Markov chain can only go from state i to itself in multiples of d steps, where d is the maximum number that satisfies that. If d > then state i is periodic. A Markov chain is periodic if at least one state is periodic and is aperiodic if no states are periodic. All states in a component have the same period. Component {4, 5, 6, 7} has period 2 and component {8, 9, 0} has period 3, so the Markov chain is periodic. Prof. Tesler Markov Chains Math 283 / Fall 208 24 / 44

Technicalities absorbing states 2 4 7 8 9 3 5 6 0 An absorbing state has all its outgoing edges going to itself; e.g., state 3 above. An irreducible Markov chain with two or more states cannot have any absorbing states. Prof. Tesler Markov Chains Math 283 / Fall 208 25 / 44

Technicalities summary 2 4 7 8 9 3 5 6 0 There are generalizations to infinite numbers of discrete or continuous states and to continuous time. We will work with Markov chains that are finite, discrete, irreducible, and aperiodic, unless otherwise stated. For a finite discrete Markov chain on two or more states: irreducible and aperiodic with no absorbing states is equivalent to P or a power of P has all entries greater than 0 and in this case, lim n P n exists and all its rows are the stationary distribution. Prof. Tesler Markov Chains Math 283 / Fall 208 26 / 44

Reverse Markov Chain pa+pc+pt p M 0 p pc+pt pa+pc+pt pa pc+pt A p pa+pc+pt p A pa p AA Reverse 0 A A AA A Markov chain modeling forwards progression of time can be reversed to make predictions about the past. For example, this is done in models of nucleotide evolution. The graph of the reverse Markov chain has the same nodes as the forwards chain; the same edges but reversed directions and new probabilities. Prof. Tesler Markov Chains Math 283 / Fall 208 27 / 44

Reverse Markov Chain The transition matrix P of the forwards Markov chain was defined so that P(X t+ = j X t = i) = p ij at all times t. Assume the forwards machine has run long enough to reach the stationary distribution, P(X t = i) = ϕ i. The reverse Markov chain has transition matrix Q, where q ij = P(X t = j X t+ = i) = P(X t+ = i X t = j)p(x t = j) P(X t+ = i) = p jiϕ j ϕ i (Recall Bayes Theorem: P(B A) = P(A B)P(B)/P(A).) Prof. Tesler Markov Chains Math 283 / Fall 208 28 / 44

Reverse Markov Chain of M pa+pc+pt 0 p pc+pt p pa+pc+pt pa pc+pt M A p pa+pc+pt p Reverse of M A pa p AA 0 A A AA 3 4 5 P 3 0 0 0 4 4 0 0 2 4 4 3 0 0 0 4 4 0 0 2 4 4 3 0 0 0 4 4 5 88 Q 45 704 88 3 704 0 0 4 60 0 0 0 0 0 0 5 6 0 6 0 0 0 0 Stationary distribution of P is ϕ = ( 6, 5 64, 5 256, 64, 256 ) Example of one entry: The edge 0 A in the reverse chain has q 3 = p 3 ϕ 3 /ϕ = ( 3 5 4 )( 256 )/( 6 ) = 45 704. This means that in the steady state of the forwards chain, when 0 is entered, there is a probability 45 704 that the previous state was A. Prof. Tesler Markov Chains Math 283 / Fall 208 29 / 44

Matlab and R Matlab >> d_sstate = diag(sstate) d_sstate = 0.6875 0 0 0 0 0 0.2344 0 0 0 0 0 0.0586 0 0 0 0 0 0.056 0 0 0 0 0 0.0039 >> Q = inv(d_sstate) * P * d_sstate Q = 0.7500 0.705 0.0639 0.04 0.0043 0.7333 0.2500 0 0.067 0 0.0000 0 0 0 0 0 0.9375 0 0.0625 0 0 0.0000 0 R > d_sstate = diag(sstate) > Q = solve(d_sstate) %*% t(p) %*% d_sstate Prof. Tesler Markov Chains Math 283 / Fall 208 30 / 44

Expected time from state i till next time in state j pa+pc+pt p 0 p pc+pt pa+pc+pt pa pc+pt A p pa+pc+pt p A pa p AA If M is in state, what is the expected number of steps until the next time it is in state AA? More generally, what s the expected # steps from state i to state j? Fix the end state j once and for all. Simultaneously solve for expected # steps from all start states i. For i =,..., s, let N i be a random variable for the number of steps from state i to the next time in state j. Next time means that if i = j, we count until the next time at state j, with N j ; we don t count it as already there in 0 steps. We ll develop systems of equations for E(N i ), Var(N i ), and P Ni (x). Prof. Tesler Markov Chains Math 283 / Fall 208 3 / 44

Expected time from state i till next time in state j Time t t + t ++N i #Steps Probability N + P N 2 N 2 N 2 + P 2 3 4 N 3 N 4 5 N 3 + N 4 + P 3 P 4 Fix j = 5. 5 Random variable N r = # steps from state r to next time in state j. Dotted paths have no occurrences of state j in the middle. Expected # steps from state i = to j = 5 (repeat this for all i): (time t) (time t + ) E(N ) = P E(N + ) + P 2 E(N 2 + ) + P 3 E(N 3 + ) + P 4 E(N 4 + ) + P 5 E() Both N s have same distribution, and we can expand E() s: E(N ) = P r = ( P r E(N r ) ) + r : r j P r E(N r ) + r r : r j Prof. Tesler Markov Chains Math 283 / Fall 208 32 / 44 P 5

Expected time from state i till next time in state j Recall we fixed j, and defined N i relative to it. Start in state i. There is a probability P ir of going one step to state r. If r = j, we are done in one step: E(N i st step is i j) = If r j, the expected number of steps after the first step is E(N r ): E(N i st step is i r) = E(N r + ) = E(N r ) + Combine with the probability of each value of r: s s E(N i ) = P ij + P ir E(N r + ) = P ij + P ir (E(N r ) + ) = s P ir + r= r=, r j s P ir E(N r ) = + r=, r j r=, r j s P ir E(N r ) r=, r j Doing this for all s states, i =,..., s, gives s equations in the s unknowns E(N ),..., E(N s ). Prof. Tesler Markov Chains Math 283 / Fall 208 33 / 44

Expected times between states in M: times to state 5 E(N ) = 0 + 3 4 (E(N ) + ) + 4 (E(N 2) + ) = + 3 4 E(N ) + 4 E(N 2) E(N 2 ) = 0 + 2 (E(N ) + ) + 4 (E(N 2) + ) + 4 (E(N 3) + ) = + 2 E(N ) + 4 E(N 2) + 4 E(N 3) E(N 3 ) = 0 + 3 4 (E(N ) + ) + 4 (E(N 4) + ) = + 3 4 E(N ) + 4 E(N 4) E(N 4 ) = 4 + 2 (E(N ) + ) + 4 (E(N 2) + ) = + 2 E(N ) + 4 E(N 2) E(N 5 ) = 0 + 3 4 (E(N ) + ) + 4 (E(N 4) + ) = + 3 4 E(N ) + 4 E(N 4) This is 5 equations in 5 unknowns E(N ),..., E(N 5 ). Matrix format: /4 /4 0 0 0 E(N ) /2 3/4 /4 0 0 E(N 2 ) 3/4 0 /4 0 E(N 3 ) /2 /4 0 0 E(N 4 ) = 3/4 0 0 /4 E(N 5 ) } {{ } Zero out j th column of P. Then subtract from each diagonal entry. } {{ } All s. E(N ) = 272, E(N 2 ) = 268, E(N 3 ) = 256, E(N 4 ) = 204, E(N 5 ) = 256. Matlab and R: Enter matrix C and vector r. Solve C x = r with Matlab: x=c\r or x=inv(c)*r R: x=solve(c,r) Prof. Tesler Markov Chains Math 283 / Fall 208 34 / 44

Variance and PF of number of steps between states We may compute E(g(N i )) for any function g by setting up recurrences in the same way. For each i =,..., s: E(g(N i )) = P ij g()+ P ir E(g(N r +)) = expansion depending on g r j Variance of N i s: Var(N i ) = E(N 2 i ) (E(N i )) 2 s s E(N 2 i ) = P ij 2 + P ir E((N r +) 2 ) = +2 P ir E(N r )+ r=, r j r=, r j s P ir E(N 2 r ) r=, r j Plug in the previous solution of E(N ),..., E(N s ). Then solve the s equations for the s unknowns E(N 2 ),..., E(N s 2 ). PF: P Ni (x) = E(x N i ) = n=0 P(N i = n)x n s E(x N i ) = P ij x + P ir E(x Nr+ ) = P ij x + r=, r j s P ir x E(x N r ) r=, r j Solve the s equations for s unknowns E(x N ),..., E(x N s ). See the old handout for a worked out example. Prof. Tesler Markov Chains Math 283 / Fall 208 35 / 44

Powers of matrices (see separate slides) Sample [ matrix: ] 8 P = 6 3 Diagonalization: [ ] P [ = VDV ] 2 5 0 V = D = V 3 4 0 6 = P n = (VDV )(VDV ) (VDV ) = VD n V = [ 2 3 4 ][ 5 n ][ 0 2 0 6 n [ 2 ] 3/2 /2 3 2 2 When a square (s s) matrix P has distinct eigenvalues, it can be diagonalized P = VDV where D is a diagonal matrix of the eigenvalues of P (any order); the columns of V are right eigenvectors of P (in same order as D); the rows of V are left eigenvectors of P (in same order as D); If any eigenvalues are equal, it may or may not be diagonalizeable, but there is a generalization called Jordan Canonical Form, P = VJV giving P n = VJ n V. J has eigenvalues on the diagonal and s and 0 s just above it, and is also easy to raise to powers. Prof. Tesler Markov Chains Math 283 / Fall 208 36 / 44 ]

Matrix powers spectral decomposition (distinct eigenvalues) Powers of P: P n = (VDV [ ] )(VDV ) [ = VD n V P n = VD n V 5 n 0 = V 0 6 n V 5 = V n ] [ ] 0 V 0 0 + V 0 0 0 6 n V [ 5 V n ] [ ][ 0 V 2 5 = n ][ ] [ 0 2 ()(5 = n )( 2) ()(5 n ] )() 0 0 3 4 0 0.5.5 (3)(5 n )( 2) (3)(5 n )() [ ] [ ] = 5 n [ 2 ] = n λ r 3 l = 5 n 2 6 3 [ ] [ ] [ ] [ ] [ 0 0 V 0 6 n V 2 0 0 2 2(6 = 3 4 0 6 n = n )(.5) 2(6 n ] )(.5).5.5 4(6 n )(.5) 4(6 n )(.5) [ ] [ ] = 6 n 2 [.5 ].5 = n λ2 r 4 2 l 2 = 6 n 3 6 2 Spectral decomposition of P n : [ ] P n = VD n V = λ n r l + λ n 2 r 2 l 2 = 5 n 2 6 3 [ ] + 6 n 3 6 2 Prof. Tesler Markov Chains Math 283 / Fall 208 37 / 44

Jordan Canonical Form Matrices with two or more equal eigenvalues cannot necessarily be diagonalized. Matlab and R do not give an error or warning. The Jordan Canonical Form is a generalization that turns into diagonalization when possible, and still works otherwise: P = VJV J = B 0 0 0 B 2 0 0 0 B 3...... P n = VJ n V where B n 0 0 J n 0 B = n 2 0 0 0 B 3n B i n =...... B i = λ i 0 0 0 0 0 λ i 0 0 0 0 0 λ i 0 0......... 0 0 0 0 λ i 0 0 0 0 0 λ i n λ ( ) n n ( n n 2 i λi ( 2) λi n 0 λ n ) n i λi.................. n 0 0 0 λ ( n i n 0 0 0 0 λ i In applications when repeated eigenvalues are a possibility, it s best to use the Jordan Canonical Form. ) λi n Prof. Tesler Markov Chains Math 283 / Fall 208 38 / 44

Jordan Canonical Form for P in Matlab (R doesn t currently have JCF available either built-in or as an add-on)» P = [ [ 3/4, /4, 0, 0, 0 ]; % [ 2/4, /4, /4, 0, 0 ]; % [ 3/4, 0, 0, /4, 0 ]; % A [ 2/4, /4, 0, 0, /4 ]; % A [ 3/4, 0, 0, /4, 0 ]; % AA ] P = 0.7500 0.2500 0 0 0 0.5000 0.2500 0.2500 0 0 0.7500 0 0 0.2500 0 0.5000 0.2500 0 0 0.2500 0.7500 0 0 0.2500 0» [V,J] = jordan(p) V = -0.0039-0.095-0.0707-0.2298-0.0430 0.07 0.0430 0.339 0.4066-0.0430-0.0039 0.0430 0.793 0.5884-0.0430 0.07 0.0430 0.3839.4066-0.0430-0.0039 0.0430 0.793.5884-0.0430» Vi = inv(v) Vi = 0 52.3636-64.0000.6364 0-6.0000 6.0000 3.0909-6.0000 2.909 0-4.0000 4.0000 4.0000-4.0000 0 0 -.0000 0.0000-6.0000-5.4545 -.3636-0.3636-0.0909» V * J * Vi ans = 0.7500 0.2500-0.0000-0.0000 0.0000 0.5000 0.2500 0.2500-0.0000 0.0000 0.7500-0.0000-0.0000 0.2500-0.0000 0.5000 0.2500 0.0000-0.0000 0.2500 0.7500-0.0000-0.0000 0.2500-0.0000 J = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Prof. Tesler Markov Chains Math 283 / Fall 208 39 / 44

Powers of P using JCF P = VJV gives P n = VJ n V, and J n is easy to compute:» J J = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0» Jˆ2 ans = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0» Jˆ3 ans = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0» Jˆ4 ans = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0» Jˆ5 ans = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 For this matrix, J n = J 4 when n 4, so P n = VJ n V = VJ 4 V = P 4 for n 4. Prof. Tesler Markov Chains Math 283 / Fall 208 40 / 44

Non-overlapping occurrences of AA pa+pc+pt 0 p pc+pt p pa+pc+pt pa pc+pt M2 A p pa+pc+pt p p A pa AA P2 3/4 /4 0 0 0 /2 /4 /4 0 0 3/4 0 0 /4 0 /2 /4 0 0 /4 3/4 /4 0 0 0» [V2,J2] = jordan(p2) V2 = -0.0625-0.570-0.728 0.76 + 0.0294i 0.76-0.0294i 0.875.30-0.728-0.3824 + 0.0294i -0.3824-0.0294i -0.0625 0.4830-0.728 0.76-0.4706i 0.76 + 0.4706i 0.875.30-0.728 0.76 + 0.0294i 0.76-0.0294i -0.0625 0.4830-0.728 0.76 + 0.0294i 0.76-0.0294i J2 = 0.0000 0 0 0 0 0 0 0 0 0 0.0000 0 0 0 0 0 0 + 0.2500i 0 0 0 0 0 0-0.2500i» V2i = inv(v2) V2i = 3.2727 0-0.0000 4.0000-7.2727 -.0000 0 0.0000 0.0000-3.9787 -.367-0.3404-0.085-0.023 0 -.0000 0 +.0000i.0000 0 -.0000i 0 -.0000 0 -.0000i.0000 0 +.0000i Prof. Tesler Markov Chains Math 283 / Fall 208 4 / 44

Non-overlapping occurrences of AA JCF (J2) n = [ 0 0 0 ] n n (i/4) n ( i/4) n [ ] 0 0 0 0 = [ ] n 0 0 0 = [ 0 0 [ ] 0 0 0 0 ] [ ] 0 0 0 = for n 2 [ ] 0 0 0 One eigenvalue =. It s the third one listed, so the stationary distribution is the third row of (V2) normalized:» V2i(3,:) / sum(v2i(3,:)) ans = 0.6875 0.2353 0.0588 0.047 0.0037 Two eigenvalues = 0. The interpretation of one of them is that the first and last rows of P2 are equal, so (, 0, 0, 0, ) is a right eigenvector of P2 with eigenvalue 0. Two complex eigenvalues, 0 ± i/4. Since P2 is real, all complex eigenvalues must come in conjugate pairs. The eigenvectors also come in conjugate pairs (last 2 columns of V2; last 2 rows of (V2). Prof. Tesler Markov Chains Math 283 / Fall 208 42 / 44

Spectral decomposition with JCF and complex eigenvalues (J2) n = [ ] n 0 0 0 n (i/4) n ( i/4) n [ ] 0 0 0 0 = [ ] n 0 0 0 = [ 0 0 [ ] 0 0 0 0 ] [ ] 0 0 0 = for n 2 [ ] 0 0 0 (P2) n = (V2)(J2) n (V2) = [ ] [ ] n [ 0 l r r ] 2 + r 0 0 3 () n l 3 + r 4 (i/4) n l 4 + r 5 ( i/4) n l 5 l 2 The first term vanishes when n 2, so when n 2 the format is = n S3 + (i/4) n S4 + ( i/4) n S5 = S3 + (i/4) n S4 + ( i/4) n S5 Prof. Tesler Markov Chains Math 283 / Fall 208 43 / 44

Spectral decomposition with JCF and complex eigenvalues For n 2, (P2) n = S3 + (i/4) n S4 + ( i/4) n S5 where» S3 = V2(:,3) * V2i(3,:) S3 = 0.6875 0.2353 0.0588 0.047 0.0037 0.6875 0.2353 0.0588 0.047 0.0037 0.6875 0.2353 0.0588 0.047 0.0037 0.6875 0.2353 0.0588 0.047 0.0037 0.6875 0.2353 0.0588 0.047 0.0037» S4 = V2(:,4) * V2i(4,:) S4 = 0-0.76-0.0294i -0.0294 + 0.76i 0.76 + 0.0294i 0.0294-0.76i 0 0.3824-0.0294i -0.0294-0.3824i -0.3824 + 0.0294i 0.0294 + 0.3824i 0-0.76 + 0.4706i 0.4706 + 0.76i 0.76-0.4706i -0.4706-0.76i 0-0.76-0.0294i -0.0294 + 0.76i 0.76 + 0.0294i 0.0294-0.76i 0-0.76-0.0294i -0.0294 + 0.76i 0.76 + 0.0294i 0.0294-0.76i» S5 = V2(:,5) * V2i(5,:) S5 = 0-0.76 + 0.0294i -0.0294-0.76i 0.76-0.0294i 0.0294 + 0.76i 0 0.3824 + 0.0294i -0.0294 + 0.3824i -0.3824-0.0294i 0.0294-0.3824i 0-0.76-0.4706i 0.4706-0.76i 0.76 + 0.4706i -0.4706 + 0.76i 0-0.76 + 0.0294i -0.0294-0.76i 0.76-0.0294i 0.0294 + 0.76i 0-0.76 + 0.0294i -0.0294-0.76i 0.76-0.0294i 0.0294 + 0.76i S3 corresponds to the stationary distribution. S4 and S5 are complex conjugates, so (i/4) n S4 + ( i/4) n S5 is a sum of two complex conjugates; thus, it is real-valued, even though complex numbers are involved in the computation. Prof. Tesler Markov Chains Math 283 / Fall 208 44 / 44