Computing the nearest reversible Markov chain

Size: px

Start display at page:

Download "Computing the nearest reversible Markov chain"

Ethel Dennis
5 years ago
Views:

1 Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D Berlin-Dahlem Germany ADAM NIELSEN AND MARCUS WEBER Computing the nearest reversible Markov chain ZIB Report (December 2014)

2 Herausgegeben vom Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D Berlin-Dahlem Telefon: Telefax: bibliothek@zib.de URL: ZIB-Report (Print) ISSN ZIB-Report (Internet) ISSN

3 Computing the nearest reversible Markov chain Adam Nielsen Marcus Weber Abstract Reversible Markov chains are the basis of many applications. However, computing transition probabilities by a finite sampling of a Markov chain can lead to truncation errors. Even if the original Markov chain is reversible, the approximated Markov chain might be non-reversible will lose important properties, like the real valued spectrum. In this paper, we show how to find the closest reversible Markov chain to a given transition matrix. It turns out that this matrix can be computed by solving a convex minimization problem. 1 Introduction A Markov chain with n N states is described through a transition matrix P R n n, i.e. P ij = 1 P ij 0 for i, j = 1,..., n. A vector π R n is called a probability distribution if π i = 1 π j 0 i=1 for j = 1,..., n holds. A Markov chain or its corresponding transition matrix P is called reversible according to a probability distribution π if the following equation DP = P T D is valid for the diagonal matrix D = diag(π 1,..., π n ). stationary distribution of the Markov chain, i.e. π T P = π T. In this case, π is a The main result of this article is that for any transition Matrix P, any probability distribution π, any norm on R n n which is induced by a scalar product, there exists a unique transition matrix P which is reversible according to π has minimal distance to P with respect to the norm. This article is structured as follows. First, we will prove the above conjecture. Then, we show how to numerically obtain the reversible matrix P, discuss the computational cost include a perturbation analysis. Finally, we will give an application a numerical example. nielsen@zib.de weber@zib.de 1

4 2 Existence Proof Given a stochastic matrix P a stationary distribution π, it will be proven in the following that there exists a unique transition matrix P which minimizes P P is reversible according to π. First, let us consider U = {A R n n DA = A T D k R with A ij = k for all i = 1,..., n} where D = diag(π 1,..., π n ) denotes the diagonal matrix with values π i on the diagonal. Notice that U is a subspace of R n n because for A, B U with n a 1j = k 1 n i=1 b 1j = k 2 we obtain for any α, β R that D(αA + βb) = αda + βdb = αa T D + βb T D = (αa + βb) T D αa ij + βb ij = αk 1 + βk 2 for all i = 1,..., n. We can show the following property: Lemma 2.1. For A U with n A ij = k it holds π T A = kπ T. Proof. Since A U, we have π i A ij = π j A ji. Therefore, we obtain (π T A) i = π l A li = π i l=1 n l=1 A il = π i k. For an understing of the space U, the following matrices are of interest: π s if i = r j = s, π r if i = s j = r, A [r,s] 1 π s if i = j = r, ij = 1 π r if i = j = s, 1 if i = j s i r, 0 else, δ [r,s] ij = 1 if i = j i r, 1 if i = r j = s, 0 else. Proposition 2.1. The set {A [r,s] (r, s) I A } {δ [r,s], δ [s,r] (r, s) I B } {Id} 2

5 is a basis of U where Id is the identity matrix I = {(r, s) 1 r < s n}, A = {i: π i 0}, B = {i: π i = 0}, The dimension of U is given by I A = {(r, s) I r A or s A}, I B = {(r, s) I r, s B}. dim(u) = ( ) n ( ) B. 2 Proof. One may notice that the row sum of Id, A [r,s] δ [r,s] is always one that ( DA [r,s] = A [r,s]) T D holds. The latter can be seen by ( DA [r,s]) ij = k=1 = π i A [r,s] ij ( ) D ik A [r,s] kj = π j A [r,s] ji = A [r,s] ki D kj = k=1 ( ) (A [r,s] ) T D where ( ) holds because for i = r j = s or vice versa we have π i A [r,s] ij ij = π r π s = π s π r = π j A [r,s] ji, for i = j the equation in ( ) is trivial in all other cases we have A [r,s] A [r,s] ji = 0, therefore A [r,s] U. Also, we have Dδ [r,s] = δ [r,s]t D for all (r, s) I B. This is because π r = 0, therefore, { Dδ [r,s] π i if i = j, i,j = 0 else., ij = Since Dδ [r,s] is just a diagonal matrix, it is in particular symmetric we obtain Dδ [r,s] = (Dδ [r,s] ) T = δ [r,s]t D, the argument is analogous for δ [s,r]. To prove that these matrices are indeed a basis of U, it remains to show that they are linearly independent that they span the subspace U. 3

6 To see the linearly independence, consider now an arbitrary linear combination of zero: α r,s A [r,s] + α r,s δ [r,s] + β r,s δ [s,r] + αi = 0. (r,s) I A (r,s) I B For (r, s) I A the matrix A [r,s] is the only matrix in the above linear combination that could have a non-zero entry in row r column s in row s column r. Therefore, we obtain 0 = α r,s π s 0 = α r,s π r. Thus, π s 0 or π r 0 provides α r,s = 0. For (r, s) I B, we obtain analogously α r,s = 0 β r,s = 0. The linear combination reduces to α Id = 0 which, finally, leads us to α = 0. It remains to show that the given matrices span the subspace U. Thus, let us consider a matrix C U with n C ij = k for some k R. For (r, s) I A define α r,s := Csr π r if π r 0 otherwise α r,s := Crs π s. From π r C rs = π s C sr we get that For (r, s) I B choose α r,s A [r,s] rs = C rs α r,s A [r,s] sr = C sr. α r,s = C r,s β r,s = C s,r. Since each off-diagonal element appears in exactly one matrix, C differs from C := α r,s A [r,s] + α r,s δ [r,s] + β r,s δ [s,r] (r,s) I A (r,s) I B only in the diagonal. We also know that there exists l R with n C ij = l for i = 1,..., n since C U. Therefore, the matrix Ĉ := C + (k l) Id has row-sum k Ĉ ii = k Ĉ ij = k C ij = C ii, holds. Therefore,,j i C = Ĉ U,,j i which shows that the given matrices span U. Furthermore, dim U = I A + I B + I B + 1 = I + I B + 1. The statement follows from I = I B = ( ) n 2 ( ) B. 2 4

7 Our goal is to find a matrix P X = {A U A ij 0 for i, j = 1,..., n a 1j = 1} such that: P P A P for all A X. (1) To do so, let us characterize the set X. To simplify notation, let us denote the basis of Proposition 2.1 by (v i ) i=1,...,m with m = dim U v m = Id. For any A X we have a unique coefficent vector x R m with A = m i=1 x iv i. From n a ij = 1 we get that 1 = a ij = = = ( m ) x l v l (i, j) m l=1 x l m x l. l=1 l=1 v l (i, j) This can be rewritten in 1 T x = 1 where 1 R m is the constant vector 1 i = 1 for i = 1,..., m. Further, we have a ij 0 for all i j if only if x l 0 for all l = 1,..., m 1. This can be rewritten as xe i 0 for i = 1,..., m 1. The diagonal entries of A can be positive even if x m is negative. Let us quickly go back to the old notation to see how we have to define the restriction here. So let A be given by A = α r,s A [r,s] + α r,s δ [r,s] + β r,s δ [s,r] + αid, (r,s) I A (r,s) I B thus, a ii = α r,s A [r,s] ii + α r,s δ [r,s] ii (r,s) I A (r,s) I B Which leads to a ii = α r,s (1 π s )+ α r,s (1 π r )+ + β r,s δ [s,r] ii + αi ii. α r,s + α r,s + β r,s +α. (r,s) I A r=i (r,s) I A s=i (r,s) I A r i s (r,s) I B r i (r,s) I B s i The condition a ii 0 is equivalent to xg i 0 where 1 π s if v j = A [i,s] for some s > i, 1 π r if v j = A [r,i] for some r < i, g i (j) = 0 if v j = δ [i,s] for some s, 1 else, 5

8 for i = 1,..., n j = 1,..., m. Given the matrix e T 1. C = 1 e T m 1 g T R (n+m 1) m, 1. the condition that A = m i=1 x iv i is in set X is equivalent to g T n Cx 0 1 T x = 1. In order to find a matrix P X which satisfies inequality (1), let us recall that the norm is induced by a scalar product,, i.e. A = A, A for any matrix A R n n. We want to minimize the term with where m x i v i P i=1 2 = m m x i x j v i, v j 2 x i v i, P + P, P i, i=1 = 1 2 xt Qx + x T f + c (2) Cx 0 1 T x = 1, Q(i, j) := 2 v i, v j, f(i) = 2 v i, P c = P, P. Since Q is a Gram matrix of linear independent vectors, it is positive definite. This follows because for any x R n it holds x T Qx = n x i Q ij x j = x i v i, x i v i. i,j i=1 i=1 Since v 1,..., v n is a basis, we have that n i=1 x iv i 0 for x 0, in consequence, x T Qx > 0 for x 0. Since Q is positive definite, the quadratic function (2) is strongly convex. Therefore, we have formulated the problem into a strongly convex quadratic programming problem that attains its global minimum, since the quadratic function is coercive, continuous the set X is non-empty because of Id X, see [9] Theorem Also, the global minimum is unique because the quadratic function is strongly convex. Thus, we have proven the existence of a unique matrix P X which fulfills inequality (1). We will now discuss how to compute this matrix. 6

9 3 Complexity Perturbation To avoid technical difficulties, we will assume in this chapter that π i > 0 for i = 1,..., n. For A R n n the trace tr(a) is given by the sum of the diagonal elements tr(a) = A ii. It is known that i=1 A, A F := tr(a T B) is a scalar product on R n n for A, B R n n. This scalar product induces the Frobenius norm A F = A, A F = a ij 2. i=1 The following complexity analysis will be given according to the Frobenius norm. 3.1 Complexity Unfortunately, the matrix Q is quite large, in particular we have Q R m m where m = dim(u) = 1 + n2 2. Each entry of Q is given by a trace of two sparse matrices which can be computed by the following formula n π r π r π s π s if r, r, s, s are distinct, n 1 2π s + (1 π r )(1 π r ) if r r, s = s, A [r,s], A [r,s ] = n 1 2π r + (1 π s )(1 π s ) if r = r, s s, F n 1 + (1 π s )(1 π s ) + (1 π r )(1 π r ) if r = s or r = s, n 2 + (1 π r ) 2 + πr 2 + (1 π s ) 2 + πs 2 if r = r s = s. Note that because of r < s r < s other cases are not possible. We only give a proof for the case r r s = s, all other cases can be obtained analogously. First, if A [r,s] ij 0 for i j it follows that either i = r, j = s or i = s, j = r holds. Since r r, s = s we have A [r,s ] ij = 0, therefore, A [r,s], A [r,s ] = F i=1 A [r,s] ii A [r,s ] ii = n 3 + A [r,s] rr A [r,s ] rr + A [r,s],s ] r r A[r r r + A[r,s] ss A [r,s ] ss = n π s + 1 π s + (1 π r )(1 π r ) = n 1 2π s + (1 π r )(1 π r ). Therefore, the computational cost to evaluate a single entry of Q does not increase with n the effort to compute store Q is O(n 4 ). The convex minimization problem can be solved with a barrier method, e.g. the interior point method. The method consist of N Newton iterations.. 7

10 The number N of Newton iterations to find a strictly feasible point is bounded by N ( n + n 2 ) n + n 2 /2 1 /2 1 log γ ε where ε > 0 is the demed accuracy γ is a constant depending on the choice of backtracking parameter, see [1], section Therefore, the amount of Newton iterations is bounded by O(n log n). Unfortunately, each Newton iteration has to solve a linear equation system cost O((n 2 ) 3 ) since Q R ( n2 n2 2 +1) ( 2 +1). Therefore, the total cost for the optimization problem is bounded by O(n 7 log n). In the field of convex optimization, the upper bound for the amount of Newton iterations is known to be a large overestimation [1]. If we assume this amount to be independent from n, then we can expect that the time to find a solution with the convex optimization problem should be given by g(n) = αn 6 where n is the size of the matrix P R n n. If we include the bad estimation for the Newton iteration, then the time should be represented by f(n) = βn 7 log(n). In order to explore the computation time of the convex optimization problem, we generated for each n = 10,..., 100 a stochastic matrix P R n n a stochastic vector π R n where each entry was drawn from the stard uniform distribution on the open interval (0,1) then we normalized P π. We used Matlab R2012b on a 3 gigahertz computer with 8 gigabyte memory. We solved the convex optimization problem with the interior-point-convex algorithm from the provided Matlab method quadprog with default options, i.e. relative dual feasibility =2.31e 15 with TolFun=1e 0.8, complementarity measure =1.68e 10 with TolFun=1e 0.8 relative max constraint violoation = 0 with TolCon=1e 0.8. For n = 45 the execution time was about seconds. The scalars α, β where chosen such that f(45) = g(45) = holds. In Figure 1 one can see that g seems to be a reasonable approximation of the execution time. Also one can see that the execution time takes only seconds for matrices in R 50 50, but already 32 minutes for matrices in R Perturbation analysis For we obtain k := min r,r,s,s A [r,s], A [r,s ] F k Id Q n Id where the inequality has to be read componentwise. The upper bound follows from v i, v j F v i, v i F v j, v j F max { v l, v l F } l=1,...,m the fact that A [r,s], A [r,s] = n 2 + (1 π r ) 2 + πr 2 + (1 π s ) 2 + πs 2 n F 8

11 70 Minutes Time of minmization algorithm g f n Figure 1: Duration of convex minimization problem to find closest reversible Markov chain. Id, Id F = n holds. Since Q is symmetric, its condition number according to the spectral norm is given by κ(q) = λ max λ min n k. Since k = n c for some 0 < c < 4 we have that κ(q) 1 1 c/n 1 for n. Therefore, Q is well conditioned. We assume now that we are interested in finding a reversible matrix ˆP X, but that we only have a perturbed version P = ˆP + E of ˆP which is not necessarily reversible. With the above scheme, we can find a reversible matrix P X which is closest to P according to the Frobenius norm. The question arises, how eigenvalues change between P ˆP. In order to answer this question, the weighted Frobenius norm is introduced with A F := D 1 2 AD 1 2 F D = diag(π 1,..., π n ), D 1 2 D 1 2 = D D 1 2 := (D 1 2 ) 1. Note that the weighted Frobenius norm is given by A 2 F = i, a 2 π i i,j We can then give the following estimation about the eigenvalues: π j. 9

12 Theorem 3.1. Let A, B X, let λ 1,..., λ n be the eigenvalues of A ˆλ 1,..., ˆλ n be the eigenvalues of B. Then there exists a permutation σ of the integers 1, 2,..., n such that ˆλ σ(i) λ i 2 A B 2 F. i=1 Proof. The matrices A, B X are self-adjoint according to the scalar product x, y π := x T Dy. Lets denote with {w 1,..., w n } a, π -orthonormal basis, denote with W the matrix whose columns contain the vectors w i, i.e. W = w 1 w 2... w n. Then A := W 1 AW B = W 1 BW are symmetric, see [3] Chapter By the Hoffman Wielt Theorem (see Theorem in [5]) we obtain ˆλ σ(i) λ i 2 A B 2 F i=1 for a permutation σ, because similar matrices have the same eigenvalues. remains to show It or equivalently W 1 CW 2 F = C 2 F C 2 F = W CW 1 2 F for any matrix C R n. By construction of W we have Therefore, W T DW = I W 1 D 1 (W T ) 1 = I. (3) W CW 1 2 F = D 1 2 W CW 1 D F = tr(d 1 2 (W 1 ) T C T W T D 1 2 D 1 2 W CW 1 D 1 2 ) ( ) = tr(w 1 D 1 (W 1 ) T C T W T DW C) (3) = tr(c T C) = C 2 F, where in ( ) it is used that the trace is invariant under cyclic permutations. This shows that in order to guarantee good approximations for the eigenvalues, one has to assure a good approximation of a ij for those i, j where π j << π i. These transitions are also known as rare events often difficult to compute. We will end this chapter with a short experiment motivated by Theorem 3.1. We create 100 reversible Markov chains ( ˆP i ) i=1,...,100 R 5 5 as follows. First, 10

13 we generate a symmetric matrix A where each entry A ij with i j is chosen equally distributed from [0, 1] A ij := A ji for i > j. After normalizing A the resulting matrix is reversible [7] will be denoted as ˆP i. We perturb normalize ˆP i to obtain a perturbed version P i of ˆP i. For each P i we compute the unique reversible matrix P i which is closest to P i according to weighted Frobenius norm according to the stationary distribution of ˆP i. The matrix P i exists can be computed by the introduced convex optimization problem, because the weighted Frobenius norm is induced by the scalar product A, B F = tr(dad 1 B T ). Also we compute the unique reversible matrix ˇP i which is closest to P i according to stard Frobenius norm according to the stationary distribution of ˆPi. For each tuple ( ˆP, P, P, ˇP ) we compute the corresponding eigenvalues (ˆλ j, λ j, λ j, ˇλ j ),...,5. Then, we compute the numbers c 1 = min 5 ˆλ j λ σ(j) 2, c 2 = min 5 ˆλ j λ σ(j) 2 σ Π 5 σ Π 5 c 3 = min 5 ˆλ j ˇλ σ(j) 2, σ Π 5 where Π 5 denotes the set of all permutations of the numbers {1,..., 5}. The experiment is visualized in Figure 2. The result is that the closest reversible matrix ˇP according the Frobenius norm does not maintain the spectrum of ˆP. However, the closest reversible matrix according the weighted Frobenius norm P gives a good approximation for the spectrum of ˆP improves always the spectrum of the perturbed version P Weighted correction c1 Perturbation c Unweighted correction c3 Pertubation c ˆP P F ˆP ˇP F Figure 2: Comparison of c 1 c 2. Figure 3: Comparison of c 2 c 3. Three remarks may be in order: The resulting matrix P = α r,s A [r,s] + α Id (r,s) I 11

14 is reversible up to machine precision. In fact, it is reversible if π r α r,s π s = π s α r,s π r holds 1. A numerical error in the solution (x) i=1,...,m 1 = (α r,s ) (r,s) I of the convex optimization problem results in an error of the entry P (r, s) = α r,s π s P (s, r) = α r,s π r for r s. In order to guarantee that P has row sum one, it might be advantages to set p ii = 1 p ij. j i 4 Application numerical example In the past decades, analysis of a certain class of stochastic processes has been formulated in terms of approximating a transfer operator T : L 2 (µ) L 2 (µ) [12, 8, 11] which is self-adjoint according to the scalar product f, g µ = f(x) g(x) µ(dx), i.e. E T f, g µ = f, T g µ for all f, g L 2 (µ), where (E, Σ, µ) is a measure space for some set E R n, L 2 (µ) = {f : E R f, f µ <, fµ-measurable}/n with N = {f : E R A Σ with µ(a) = 0 f(x) = 0 for all x E\A}. Furthermore, the operator has the property that for f L 2 (µ) with f 0 it follows T f 0 almost surely. Let us assume that we have a set of non-negative functions {φ 1,..., φ n } given, such that n i=1 φ i(x) = 1 for all x E φ i, 1 µ > 0 for i = 1,..., n where 1(x) := 1 for all x E. If we define π i := φ i, 1 µ T ij := T φ i, φ j µ φ i, 1 µ, then it is straightforward to verify the properties (i) πt = π, (ii) DT = T T D, (iii) n T ij = 1 for all i = 1,..., n, (iv) T ij 0, 1 One may note that I A = I because of the assumption at the beginning of this chapter. 12

15 where D = diag(π 1,..., π n ) denotes the diagonal matrix of π. In other words, T is a reversible Markov chain. This matrix plays an essential role in the Galerkin discretization of T. Theorem 4.1. Let {φ 1,..., φ n } L 2 (µ) be a basis with φ i, 1 µ > 0 of a subspace D, Q: L 2 (µ) D the orthogonal projection onto D. For any self-adjoint continuous operator T : L 2 (µ) L 2 (µ) we have M = S 1 T, T ij = T φ i, φ j µ φ i, 1 µ, S ij = φ i, φ j µ φ i, 1 µ is a right matrix representation of QT Q according to the basis A = {φ 1,..., φ n }, i.e. for any f = α i φ i, QT Qf = β i φ i it holds i=1 i=1 M(α 1,..., α n ) T = (β 1,..., β n ) T. Proof. Consider the Gram matrix of {φ 1,..., φ n } Ŝ ij = φ i, φ j µ. This matrix is invertible since {φ 1,..., φ n } is a basis the orthogonal projection Q can be represented as Qv = Ŝ 1 ij v, φ i µ φ j. i, This can be verified by checking Qv v, g µ = 0 for all g D, v L 2 (µ). From ) S = D 1 Ŝ with D = diag ( φ 1, 1 µ,..., φ n, 1 µ we obtain S 1 = Ŝ 1 D, therefore, Ŝ 1 ij = S 1 ij 1 = S 1 1 ji, φ j, 1 µ φ i, 1 µ in the last step it was used that Ŝ 1 is symmetric since Ŝ is symmetric. This implies Qv = S 1 φ j ji v, φ i µ. φ i, 1 µ Therefore, QT Qφ k = QT φ k = = = i, i, i, S 1 φ j ij T φ k, φ i µ φ i, 1 µ S 1 T φ i, φ k µ ij φ j = φ i, 1 µ ( S 1 T ) kj φ j. ( n i=1 S 1 ji T ik ) φ j 13

16 For the more general case that the operator is not self-adjoint, one can find a similar Galerkin approximation, see [10], Theorem 3. Note that S is a special case of the matrix T with T = I, where I denotes the identity operator. Therefore, we can use our machinery to correct T S if π is given. One possible choice for the functions φ i are set based functions, i.e. we have disjoint sets A 1,..., A n with n i=1 A i = E { 1 if x A i φ i (x) = 1 Ai (x) = 0 else. In this case, the matrix S from Theorem 4.1 turns out to be the identity matrix we are only left with the task of computing T. Other possible choices are radial basis functions or commitor functions which can be found in [13, 10] So far, we know how to maintain reversibility for S T separately. The question arises, if S 1 T also inherits some properties that might get lost due to numerical errors. First, we can rewrite S = D 1 Ŝ, T = D 1 ˆT with Ŝij = φ i, φ j µ ˆT ij = T φ i, φ j µ, hence S 1 T = Ŝ 1 DD 1 ˆT = Ŝ 1 ˆT. Analogously as shown for Q also Ŝ Ŝ 1 are symmetric positive definite matrices. Now, since Ŝ 1 is positive definite, we know the existence of a symmetric square matrix A such that A 2 = Ŝ 1. Thus, A 1 Ŝ 1 ˆT A = A ˆT A. Consequently, Ŝ 1 ˆT is similar to a symmetric matrix hence diagonalizable. This shows that the spectrum of Ŝ 1 ˆT is real that we know the existence of a basis of eigenvectors of S 1 T. For numerical estimation, the following procedure may be advantageous: First, approximate T S by the convex minimization problem stated above. Then, calculate DS which is symmetric up to machine precision. For the symmetric matrix (DS ) 1 one can solve a convex minimization problem that finds the closest symmetric positive semi definite matrix ((DS ) 1 ) according to the Frobenius norm, see [4]. Finally, ((DS ) 1 ) T is the corrected matrix. Also it would be possible to correct S 1 with our explained optimization scheme, because this matrix has also row sum one is reversible, this follows from S1 = 1 1 = S 1 1 DS = S T D (S T ) 1 D = DS 1 (S 1 ) T D = DS 1. Therefore, by dropping the constraints given by the matrix C, one can also approximate S 1 as a convex quadratic programing problem with only linear constrains. 14

17 A Galerkin approximation of the transfer operator can be used to apply spectral clustering algorithms such as PCCA+ [2] which relies on the real spectrum of T S 1. Especially for a set-based reduction, we can guarantee with our machinery that T will have a real spectrum,, therefore, spectral clustering such as PCCA+ is applicable. 4.1 Numerical example Consider the one dimensional, 2π-periodic function V B : R R V B (x) = a + b cos(x) + c cos 2 (x) + d cos 3 (x) with a=2.0567, b= , c= d= This can be seen as an approximation of a butane potential energy function, where x is the central dihedral angle. We are going to realize a trajectory of the dihedral angles X t = X t mod 2π of butane from the stochastic differential equation d X t = V B ( X t )dt + σdb t. If we divide [0, 2π] in the sets A i = [ i 1 30, i 30 ] for i = 1,..., 30, then the Galerkin discretization of T reduces to T ij = P(X τ A j X 0 A i ), see [8]. To gain the associated Markov State Model T R 30 30, we compute a longterm trajectory (X i ) i=0,...,n 1 by performing n timesteps of size dt using the Euler-Maruyama discretization X i+1 = X i V B ( X i )dt + σ dt η i, where η i are i.i.d rom variables distributed according to the stard normal distribution. This trajectory is chopped into pieces of length 400 yielding M subtrajectories (X k i ) i=1,...,l := (X lk,..., X l(k+1) 1 ) for k = 0,..., M 1. We can estimate T by counting transitions C ij = M 1 k=0 1 Ai (X k 1 ) 1 Aj (X k l ) C ij T ij = 30 i=1 C. ij The Markov State Model created in this way becomes only reasonable when considered for a trajectory longer then 10 6 timesteps, due to rare transition events. We will now compare the eigenvalues of the Markov State Model T to the eigenvalues of the above introduced corrected estimation T with respect to the weighted Frobenius norm F. We will also compare them for different length of a given trajectory, starting from length n = 10 6 until n = We used the timestep dt = Since the eigenvalues of T turned out to be complex sometimes, we simply set the imaginary part to zero, in order to compare the eigenvalues with T. 15

18 Corrected estimation Stard estimation nd eigenvalue Amount of trajectories (x10 7 ) Discussion In the figures the stard estimation the corrected estimation do not essentially differ for the eigenvalues close to 1. Our algorithm of finding the nearest reversible Markov chain especially preserves the dominant eigenvalues of the underlying operator, thus, is suitable for cluster analysis 2 of Molecular simulation data. In contrast to T, the resulting transition matrix T from our algorithm is guaranteed to be applicable to spectral clustering methods, such as PCCA+, because of its real spectrum. Acknowledgment We gratefully thank the Berlin Mathematical School for financial support of Adam Nielsen. 2 For the connection between cluster analysis eigenvalues we refer to [6]. 16

19 0.85 Corrected estimation Stard estimation 0.8 3rd eigenvalue Amount of trajectories (x10 7 ) Corrected estimation Stard estimation 0.3 4th eigenvalue Amount of trajectories (x10 7 ) 17

20 Corrected estimation Stard estimation 0 5th eigenvalue Amount of trajectories (x10 7 ) Corrected estimation Stard estimation 0 6th eigenvalue Amount of trajectories (x10 7 ) 18

21 References [1] S. Boyd L. Venberghe, Convex optimization, Cambridge University Press, [2] Peter Deuflhard M. Weber, Robust Perron cluster analysis in conformation dynamics, In M. Dellnitz, S. Kirkl, M. Neumann, C. Schütte, editors, Linear Algebra Its Applications - Special Issue on Matrices Mathematical Biology,volume 398C (2005), [3] G. Fischer, Lineare algebra, 17 ed., Vieweg + Teubner, [4] Nicholas J. Higham, Computing a nearest symmetric positive semidefinite matrix, Linear Algebra its Applications 103 (1988), ii iii. [5] R. Horn C. Johnson, Matrix analysis, Cambridge University Press, [6] Wilhelm Huisinga Bernd Schmidt, Metastability dominan eigenvaues of transfer operators, Lecture Notes in Computational Science Engineering 49 (2006), [7] Ph. Metzner, F. Noé, Ch. Schütte, Estimating the sampling error: Distribution of transition matrices functions of transition matrices for given trajectory data, Physical Review E, 80 (2) [8] A. Nielsen, K.Fackeldey, M.Weber, Transfer operator for arbitary measures, in submission. [9] N. V. Thoai R. Horst, P. M. Pardalos, Introduction to global optimization, Kluwer Academic Publishers, [10] M. Sarich, Projected transfer operators - discretization of Markov processes in high-dimensional state space., Ph.D. thesis, Freie Universität Berlin, [11] C. Schütte, Conformational dynamics: Modelling, theory, algorithm, application to biomolecules., habilitation, Freie Universität Berlin, [12] Christof Schütte, Wilhelm Huisinga, Peter Deuflhard, Transfer operator approach to conformational dynamics in biomolecular systems, Applied Mathematics Entering the 21th Century (2003). [13] Marcus Weber, A subspace approach to molecular Markov state models via a new infinitesimal generator., habilitation, Freie Universität Berlin,

The Monte Carlo Computation Error of Transition Probabilities

The Monte Carlo Computation Error of Transition Probabilities Zuse Institute Berlin Takustr. 7 495 Berlin Germany ADAM NILSN The Monte Carlo Computation rror of Transition Probabilities ZIB Report 6-37 (July 206) Zuse Institute Berlin Takustr. 7 D-495 Berlin Telefon: