Markov chains (week 6) Solutions

Markov chains (week 6) Solutions 1 Ranking of nodes in graphs. A Markov chain model. The stochastic process of agent visits A N is a Markov chain (MC). Explain. The stochastic process of agent visits A N is a Markov Chain because it is memoryless (the next node deps only on the current node, not on any history of nodes), and time-invariant (the probability of moving to a node j from a node i does not differ with time.) Give conditions for the following statements to be true: State i, meaning visit of agent to node i, of this MC is transient. Since the MC is finite, the only condition that a state i is transient is j can be reached from i through a path of nonzero probabilities but not vice versa. his could happen if there are no nodes in its incoming neighborhood, or if the nodes in its incoming neighborhood have no nodes in its incoming neighborhood, etc. However, here the transitions are reciprocal. Thus, if j can be reached through a path (of potentially multiple transitions) then by traversing the same path in the reverse direction, i can also be reached from j. Thus all of the states here are not transient. All states of this MC are transient. In a finite MC it is impossible that all of the states are transient. Because the MC is finite, the outside agent must return to at least some of the nodes if it keeps moving for a sufficiently large number of time steps. All the states of this MC are recurrent. We argued that none of the states are transient, thus all of the states are recurrent, by definition. This means that from any given node, you can travel to any other node in the network (not necessarily in one step), so that as n the outside agent must return to every node in the network. All states of this MC are aperiodic. Note that periodicity (and period) is a class property. Because transitions are reciprocal, one can return to a starting state in steps of multiples of 2. So, if there is at least one loop of odd length within a class, then that class is aperiodic (argue why.) Note that we assumed that there is no self loop (unless the class consists of only one state). All states are positive recurrent. Note again that positive recurrence is a class property (positive recurrence of all of the states should not be mixed up with the positive recurrence of the whole MC, which requires irreducibility.) In a finite MC, a state is positive recurrent if it is recurrent. We argued that all of the states are recurrent, thus all of the states are positive recurrent. All states are ergodic. Note that ergodicity is a class property. (Similarly, ergodicity of all of the states should not be mixed up with ergodicity of the whole MC, which requires irreducibility.) A state is ergodic iff it is aperiodic and positive recurrent. We argued that all of the states are positive recurrent, thus if all of the classes are either composed of a single state or has a odd-length loop, all of the states are ergodic. The MC is irreducible. By definition, every state needs to be able to communicate with any other state, i.e., there should be a path of nonzero probabilities between any two states. Thus there should not be singleton states (loners), neither any disjoint classes of friship. B Implement random walk. First, we provide a function which calculates the ranks for a given legitimate connectivity graph. The starting point is i=1: function ranks=ranks_by_random_walk(graph,n,initial_state)

J=min(size(graph)); % J : total number of states nr_neighbors = sum(graph); % number of neighbours of each state % Here we construct a matrix which contains the indices of the states % that each state is linked to. % This Matrix called "Neighbours" will serve as a lookup-table % in order to find out to which state exactly should the random walker % go. Specifically Neighbours(m,n)=f, if f is not zero, means that % state m is linked to state f k=max(nr_neighbors); Neighbours=zeros(J,k); for i=1:j temp=find(graph(i,:)); Neighbours(i,1:length(temp))=temp; i=initial_state; % Starting State nr_visits=zeros(j,1); for n=1:n nr_visits(i)=nr_visits(i)+1; i=neighbours(i,randi(nr_neighbors(i))); ranks=nr_visits/n; Now we call the function for the non-augmented graph, which is reducible, that is, there are disjoint classes of friship amongst the whole class. Results are depicted in fig 1. clc;close all;clear all; graph=[,,... ]; % The given graph % The next three lines are to make the "graph" a legitimate connectivity graph graph=graph+graph ; % to ensure "graph" is symmetric graph=graph>; % to ensure graph is all zeros and ones graph=graph*1; % to ensure entries are real numbers not logical zeros and ones J=min(size(graph)); figure for j=1:3 N=1ˆ(4+j); tic ranks=ranks_by_random_walk(graph,n,1); t=toc subplot(3,1,j) bar(1:j,ranks, r ) title([ N=1ˆ,num2str(j+4), Computation time is,num2str(t), sec ], FontSize,12) We can investigate the disjoint classes by looking at the resulting ranks in fig. 1. We observe that some of the states receive zero ranks (seen at states 6, 11, 16, 28, 33). They are thus states that do not belong to the class of state 1. Let s change the initial state to each one of these. If you set the starting state to 6, we achieve fig. 2 which shows that states 6, 16 and 33 construct a separate class, in which state 6 is in the

middle. Changing the states to 11 and 28 shows that each of them are singleton states (loners) and hence separate classes. In order to add the professor (fully connected node) to the set, we just need to augment the graph with a new node as in the following (the only changes are the 2nd and 3rd lines, which add an extra row and column consisting of all ones). The resulting figure is fig. 3 clc;close all;clear all; graph=[,,... ]; % The given graph J=min(size(graph)); graph=[graph ones(j,1);ones(1,j) ]; % Augmenting with the fully connected node % The next three lines are to make the "graph" a legitimate connectivity graph graph=graph+graph ; % to ensure "graph" is symmetric graph=graph>; % to ensure graph is all zeros and ones graph=graph*1; % to ensure entries are real numbers not logical zeros and ones J=min(size(graph)); figure for j=1:3 N=1ˆ(4+j); tic ranks=ranks_by_random_walk(graph,n,1); t=toc C subplot(3,1,j) bar(1:j,ranks, r ) title([ N=1ˆ,num2str(j+4), Computation time is,num2str(t), sec ], FontSize,12) Probability update. Following slides 19 and 2, in ranking nodes in graphs, the probability update rule is simply: p(n + 1) = P T p(n), where P is the transition probability matrix, and p(n) is the vector of probabilities of being in each of the states. D Find ranks using the probability update. The limits probabilities: lim n p i (n) exist as long as the MC is not periodic (when it is periodic, the probability oscillates). Since the MC here is aperiodic, these limits always exist (slides 77 to 82 of markov chains). However, for these limits to be indepent of initial distribution, the MC must be irreducible, positive recurrent, and aperiodic (hence an ergodic MC). Refer to slide 6 of markov chains. By adding fully connected node, the MC becomes irreducible and (since aperiodic) ergodic. If p A () = 1, that is the outside agent begins at node A with certainty, then the rank is: r i (A ) = lim n p i(n) Because r = π, where π is depent on the starting node. The modified MatLab code for computing the ranks using r i = lim n p i (n) follows. function ranks=ranks_by_probability_update(graph,n,initial_distribution) J=min(size(graph)); nr_neighbors = sum(graph);

transition_probabilities = graph; for k=1:j if nr_neighbors(k)> transition_probabilities(k,:)=graph(k,:)/nr_neighbors(k); else transition_probabilities(k,k)=1; J=min(size(graph)); % J : total number of states pi_d=initial_distribution; % given initial distribution for n=1:n pi_d=transition_probabilities *pi_d; ranks=pi_d; The results are found in fig. 4 and 5. The calculation times are reported in the figures as well, which are much less than the calculation times in the random walk approach of part B. And if you, like me, are wondering why the elapsed time decreases for N=1 and 15 as compared with N=5, let me know what you figure out! E Recast as system of linear equations. Focusing on the modified, irreducible graph, and following slide 64 of markov chains, we can compute the ranks using a system of linear equations. We know that for a MC: π = P T π π T 1 = 1 (I P T )π = π T 1 = 1 This means that π is an eigenvector of (I P T ) with eigenvalue, which means that π spans the null space of P T. In our ergodic MC, r = π, so we can solve the above system of equations for π to find r. The code follows. The resulting figure is fig. 6 figure tic pi=null(eye(j)-transition_probabilities ); ranks=pi/sum(pi); t=toc bar(1:j,ranks, r ) title([ Recasting as Linear system problem,, Computation time is,num2str(t), sec ], Fo xlabel( states, Fontsize,12) ylabel( ranks, Fontsize,12) F Recast as eigenvalue problem. From the equations in Part E, we can compute the eigenvector π associated with eigenvalue 1 of P T. (Refer to slide 18 of ranking nodes in graphs.) The code follows. The result is depicted in fig. 7. tic [V,D]=eig(transition_probabilities ); t=toc ranks=v(:,1)/sum(v(:,1));

G Discuss advantages of each method. (Refer to slides 18, 22-28 of ranking nodes in graphs.) Random walk The random walk approach is preferable in that it is secure (information is not shared between nodes, for a node to find its rank it only needs to know how many neighbors it has and how many time steps have passed). This also means that implementation can be distributed, that is the information does not have to be compiled and assessed in a central location, but each node can determine their rank individually. Probability update (using Markov Chain ideas) The probability update, which utilizes probability propagation, is similar to the Random Walk in that implementation can be distributed and it is fairly secure (each node has information from each of its neighbors about their neighbors). The main advantage is that it converges to the true ranks significantly faster than the random walk implementation. At each iteration, ranks of the whole states are updated, unlike the random walk approach where in each iteration, rank of only one state is updated. If the MC is very large, if the program is halted at any iteration, it provides an approximation for the ranks of all of the states, unlike linear system or eigenvector approaches where before the final solution, no (not even approximate) solution is available. We also observed that the algorithm reduces to a simple matrix time vector multiplication, for which fast algorithms are investigated. (e.g. this is the specialty of Matlab! Matlab is designed specifically to carry out such matrix computations exceptionally fast, and is capable of handling large inputs.) (Advantages about distributed implementation of probability update algorithm is discussed in slide 28.) System of Linear Equations Using a system of linear equations eliminates the problem of slow convergence because it does not use iteration. The problem is that it comprises security because the ranks are all computed in one place, so information from each node must be given. Also, for large networks it is very costly to calculate because the matrices become too large to compute directly and there is no way to get an estimated answer. Eigenvectors/eigenvalues The eigenvector approach is very similar to the system of linear equations approach, with the one advantage that it is computationally simpler. Rather than finding the nullspace of (I P T ), the eigenvector can be determined.

.1 N=1 5 Computation time is.47263sec.5 5 1 15 2 25 3 35 4.8 N=1 6 Computation time is 2.31sec.6.4.2 5 1 15 2 25 3 35 4.8 N=1 7 Computation time is 22.186sec.6.4.2 5 1 15 2 25 3 35 4 Fig. 1. Part B (1)- Reducible Graph: Ranks calculated based on random walk and taking the time average of the number of visits to each state. Estimated times for N = 1 5, N = 1 6, N = 1 7 is reported where N is the total duration of the experiment. Starting state was 1.

.8 N=1 5 Computation time is.22785sec.6.4.2 5 1 15 2 25 3 35 4.8 N=1 6 Computation time is 2.2314sec.6.4.2 5 1 15 2 25 3 35 4.8 N=1 7 Computation time is 22.2656sec.6.4.2 5 1 15 2 25 3 35 4 Fig. 2. Part B (2)- Reducible Graph: Ranks calculated based on random walk and taking the time average of the number of visits to each state. Estimated times for N = 1 5, N = 1 6, N = 1 7 is reported where N is the total duration of the experiment. Starting state was 6.

.2 N=1 5 Computation time is.5457sec.15.1.5 5 1 15 2 25 3 35 4.2 N=1 6 Computation time is 2.231sec.15.1.5 5 1 15 2 25 3 35 4.2 N=1 7 Computation time is 21.9787sec.15.1.5 5 1 15 2 25 3 35 4 Fig. 3. Part B - Irreducible Graph: Ranks calculated based on random walk and taking the time average of the number of visits to each state. Estimated times for N = 1 5, N = 1 6, N = 1 7 is reported where N is the total duration of the experiment.

.1 N=5 Computation time is.397sec.5 5 1 15 2 25 3 35 4.1 N=1 Computation time is.2318sec.5 5 1 15 2 25 3 35 4.1 N=15 Computation time is.3451sec.5 5 1 15 2 25 3 35 4 Fig. 4. Part D - Reducible Graph: Ranks calculated based on updating the probabilities. Estimated times for N = 5, N = 1, N = 15 is reported where N is the total number of iterations. The initial distribution is [1; ;... ; ]

.2 N=5 Computation time is.32867sec.15.1.5 5 1 15 2 25 3 35 4.2 N=1 Computation time is.23767sec.15.1.5 5 1 15 2 25 3 35 4.2 N=15 Computation time is.31811sec.15.1.5 5 1 15 2 25 3 35 4 Fig. 5. Part D - Ranks calculated based on updating the probabilities. Estimated times for N = 5, N = 1, N = 15 is reported where N is the total number of iterations. The initial distribution is [1/J; 1/J;... ; 1/J]

.2 Recasting as Linear system problem, Computation time is.53576sec.18.16.14.12 ranks.1.8.6.4.2 5 1 15 2 25 3 35 4 states Fig. 6. Part E - Irreducible Graph. Ranks calculated based on Recasting the problem as a system of linear equations.

.2 Recasting as eigenvector problem Computation time is.28673sec.18.16.14.12 ranks.1.8.6.4.2 5 1 15 2 25 3 35 4 states Fig. 7. Part F - Ranks calculated based on recasting the problem as finding the eigenvector of transpose of transition probability matrix corresponding to the largest eigenvector and then normalizing.