Comparison of perturbation bounds for the stationary distribution of a Markov chain

Linear Algebra and its Applications 335 (00) 37 50 www.elsevier.com/locate/laa Comparison of perturbation bounds for the stationary distribution of a Markov chain Grace E. Cho a, Carl D. Meyer b,, a Mathematics and Engineering Analysis, The Boeing Company, P.O. Box 3707, MC 7L-, Seattle, WA 984-07, USA b Department of Mathematics, North Carolina State University, P.O. Box 805, Raleigh, NC 7695-805, USA Received 8 November 999; accepted November 000 Submitted by V. Mehrmann Abstract The purpose of this paper is to review and compare the existing perturbation bounds for the stationary distribution of a finite, irreducible, homogeneous Markov chain. 00 Elsevier Science Inc. All rights reserved. AMS classification: 65F35; 60J0; 5A5; 5A; 5A8 Keywords: Markov chains; Stationary distribution; Stochastic matrix; Group inversion; Sensitivity analysis; Perturbation theory; Condition numbers. Introduction Let P be the transition probability matrix of an n state finite, irreducible, homogeneous Markov chain. The stationary distribution vector of P is the unique positive vector π T satisfying π T P = π T, n π =. = Corresponding author. Tel.: +-99-55-384. E-mail addresses: grace.e.cho@boeing.com (G.E. Cho), meyer@math.ncsu.edu (C.D. Meyer). URL: http://www4.ncsu.edu/ ehcho/home.htmlb (G.E. Cho), http://meyer.math.ncsu.edu/ (C.D. Meyer). Supported in part by National Science Foundation grants CCR-973856 and DMS-9748. 004-3795/0/$ - see front matter 00 Elsevier Science Inc. All rights reserved. PII:S004-3795(0)0030-

38 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 Suppose P is perturbed to a matrix P, that is the transition probability matrix of an n state finite, irreducible, homogeneous Markov chain as well. Denoting the stationary distribution vector of P by π, the goal is to describe the change π π in the stationary distribution in terms of the change E P P in the transition probability matrix. For suitable norms, π π κ E for various different condition numbers κ. We review eight existing condition numbers κ,...,κ 8. Most of the condition numbers we consider are expressed in terms of either the fundamental matrix of the underlying Markov chain or the group inverse of I P. The condition number κ 8 is expressed in terms of mean first passage times, providing a qualitative interpretation of error bound. In Section 4, we compare the condition numbers.. Notation Throughout the article the matrix P denotes the transition probability matrix of an n state finite, irreducible, homogeneous Markov chain C and π denotes the stationary distribution vector. Then π T P = π T, π > 0, π T e =, where e is the column vector of all ones. The perturbed matrix P = P E is the transition probability matrix of another n state finite, irreducible, homogeneous Markov chain C with the stationary distribution vector π: π T P = π T, π >0, π T e =. The identity matrix of size n is denoted by I. For a matrix B,the(i, ) component of B is denoted by b i. The -norm v of a vector v is the absolute entry sum, and the -norm B of a matrix B is its maximum absolute row sum. 3. Condition numbers of a Markov chain The norm-wise perturbation bounds we review in this section are of the following form: π π p κ l E q, where (p, q) = (, ) or (, ), depending on l.

G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 39 Most of the perturbation bounds we will consider are in terms of one the two matrices related to the chain C: thefundamental matrix and the group inverse of A I P.Thefundamental matrix of the chain C is defined by Z (A + eπ T ). The group inverse of A is the unique square matrix A # satisfying AA # A = A, A # AA # = A #, AA # = A # A. The lists of the condition numbers κ l and the references are as follows: κ = Z Schweitzer [7] κ = A # Meyer [] κ 3 = max (a # min i a i # ) Haviv and van Heyden [5] Kirkland et al. [9] κ 4 = max i, a i # Funderlic and Meyer [4] κ 5 = τ (P ) Seneta [] κ 6 = τ (A # ) = τ (Z) Seneta [] κ 7 = min A () Ipsen and Meyer [6] [ ] Kirkland et al. [9] κ 8 = max maxi/= m i Cho and Meyer [] m where m i,i/=, is the mean first passage time from state i to state, andm is the the mean return time state. 3.. Schweitzer [7] Kemeny and Snell [8] call Z the fundamental matrix of the chain because most of the questions concerning the chain can be answered in terms of Z. For instance, the stationary distribution vector of the perturbed matrix P can be expressed in terms of π and Z: π T = π T (I + EZ) and π T π T = π T EZ. (3.) Eq. (3.), by Schweitzer [7], gives the first perturbation bound: π π Z E.

40 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 We define κ Z. 3.. Meyer [] The second matrix related to C is the group inverse of A. In his papers [,], Meyer showed that the group inverse A # can be used in a similar way Z is used, and, since Z = A # + eπ T, all relevant information is contained in A #, and the term eπ T is redundant [4]. In fact, in the place of (3.), we have π T = π T (I + EA # ) and π T π T = π T EA #, (3.) and the resulting perturbation bound is [] π π A # E. We define the second condition number κ A #. 3.3. Haviv and van Heyden [5] and Kirkland et al. [9] The perturbation bound in this section is derived from (3.) with a use of the following lemma: Lemma 3.. For any vector d and for any vector c such that c T e = 0, c T max i, d i d d c. The resulting perturbation bound is π π max (a # min i a i # ) (cf. Refs. [5,9].) We define κ 3 max (a # min i a i # ). E.

G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 4 3.4. Funderlic and Meyer [4] Eq. (3.) provides a component-wise bound for the stationary distribution vector: π π max i a # i E, whichleadsusto π π max i, a # i E. Funderlic and Meyer [4] called the number κ 4 max i, a # i the chain condition number. The behaviour of κ 4 is provided by Meyer [3,4]. He showed that the size of κ 4 is primarily governed by how close the subdominant eigenvalues of the chain are to. (This is not true for arbitrary matrices. For an example, see [4, p.76].) To be precise, denoting the eigenvalues of P by, λ,...,λ n, the lower bound and the upper bound of κ 4 are given by max a # (n ) i < n min i λ i i, i ( λ i). (3.3) Hence, if the chain is well-conditioned, then all subdominant eigenvalues must be well separated from, and if all subdominant eigenvalues are well separated from, then the chain must be well-conditioned. In [3,4], it is indicated that the upper bound (n )/ i ( λ i) in (3.3) is a rather conservative estimate of κ 4. If no single eigenvalue of P is extremely close to, but enough eigenvalues are within range of, then (n )/ i ( λ i) is large, even if the chain is not too badly condition. Seneta [3] provides a condition number and its bounds to overcome this problem. (See Section 3.6.) 3.5. Seneta [] The condition numbers κ, κ,andκ 4 are in terms of matrix norms. Seneta [], however, proposed the ergodicity coefficient instead of the matrix norm. The ergodicity coefficient τ (B) of a matrix B with equal row sums b is defined by τ (B) sup v T B. v = v T e=0

4 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 Note that τ (B) is an ordinary norm of B on the hyperspace H n {v : v R n,v T e = 0} of R n. eigenvalues are to b. For more discussion and study, we refer readers to [,3,7,0,5,6,8 3,5,6]. For the stochastic matrix P, the ergodicity coefficient satisfies 0 τ (P ). In case of τ (P ) <, we have a perturbation bound in terms of the ergodicity coefficient of P []: π π τ (P ) E. (3.4) (For the case τ (P ) =, see [].) We denote κ 5 τ (P ). 3.6. Seneta [] In the previous parts we noted that the group inverse A # of A = I P can be used in place of Kemeny and Snell s fundamental matrix Z. In fact, if we use ergodicity coefficients as a measure of sensitivity of the stationary distribution, then Z and A # give exactly the same information: κ 6 τ (A # ) = τ (Z), which is the condition number in the perturbation bound given by Seneta []: π π τ (A # ) E (= τ (Z) E ). In Section 3.4, we observed that the size of the condition number κ 4 = max i, a # i is governed by the closeness of the subdominant eigenvalues of P to, giving the lower and upper bound for κ 4. However, the problem of overestimating the upper bound for κ 4 occurs if enough eigenvalues of P are within the range of, even if no single eigenvalue λ i of P is close to. The following bounds for κ 5 overcome this problem: min i λ i τ (A # ) i n λ i min i λ i. Unlike the upper bound (3.3) for κ 4, the far left upper bound for κ 6 takes only the closest eigenvalue to into account. Hence, it shows that as long as the closest eigenvalue of P is not close to, the chain is well-conditioned.

G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 43 3.7. Ipsen and Meyer [6], and Kirkland et al. [9] Ipsen and Meyer [6] derived a set of perturbation bounds and showed that all stationary probabilities react in a uniform manner to perturbations in the transition probabilities. The main result of their paper is based on the following component-wise perturbation bounds: π π π π k π k min A () E, A () E, for all =,,...,n, for all k =,,...,n, (3.5) where A () is the principal submatrix of A obtained by deleting the th row and column from A. Kirkland et al. [9] improved the perturbation bound in (3.5) by a factor of : π π < min giving the condition number κ 7 min A () A (). E, 3.8. Cho and Meyer [] In the previous sections, we saw a number of perturbation bounds for the stationary distribution vector of an irreducible Markov chain. Unfortunately, they provide little qualitative information about the sensitivity of the underlying Markov chain. Moreover, the actual computation of the corresponding condition number is usually expensive relative to computation of the stationary distribution vector itself. In this section a perturbation bound is presented in terms of the structure of the underlying Markov chain. To be more precise, the condition number κ is in terms of mean first passage times. For an n-state irreducible Markov chain C, themean first passage time m i from state i to state ( /= i) is defined to be the expected number of steps to enter in state for the first time, starting in state i. Themean return time m of state is the expected number of steps to return to state for the first time, starting in state. Using the relationship m = π and a # i = a# π m i, i /=, we obtain the perturbation bound

44 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 π π [ ] max maxi/= m i E m (cf. []). We define κ 8 [ max maxi/= m i m ]. Viewing sensitivity in terms of mean first passage times can sometimes help practitioners decide whether or not to expect sensitivity in their Markov chain models merely by observing the structure of the chain without computing or estimating condition numbers. For example, consider chains consisting of a dominant central state with strong connections to and from all other states. Physical systems of this type have historically been called mammillary systems (see [4,7]). The simplest example of a mammillary Markov chain is one whose transition probability matrix has the form p 0 0 p P = 0 p 0 p........... 0 0 p k p k q q q k q (3.6) in which the p i s and q i s are not unduly small (say, p i >.5andq i /k). Mean first passage times m i, i =, in mammillary structures are never very large. For example, in the simple mammillary chain defined by (3.6) it can be demonstrated that p i when = k +, m i = +σ q p when i = k +, +σ q + p i p when i, = k +, where σ = k h= q h p h. Since each m, it is clear that κ 8 cannot be large, and consequently no stationary probability can be unduly sensitive to perturbations in P. It is apparent that similar remarks hold for more general mammillary structures as well. 4. Comparison of condition numbers In this section, we compare the condition numbers κ l. The norm-wise perturbation bounds in Section 3 are of the following form: π π p κ l E q,

G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 45 where (p, q) = (, ),or(, ), depending on l.(forκ 4, a strict inequality holds.) The purpose of this section is simply to compare the condition numbers appearing in those norm-wise bounds. Therefore we use the form π π κ l E for all condition numbers although some of the bounds are tighter in this form. (See Remark 4..) Lemma 4.. (a) max max i,k a i # a# k =max (a # min i a i # ), (b) max (a # min i a i # ) (c) max i, a i # A#, max i, (d) max (a # min i a # i ) τ (A # ), max (a (e) τ (A # # max a i # ) i ) n, (f) τ (A # ) A #, τ (A # ) Z, τ (A # ) τ (P ), (g) A # Z A # +. a i # < max (a # min a i # ), i Proof. (a) Using a symmetric permutation, we may assume that a particular probability occurs in the last position of π. Partition A and π as follows: ( ) ( ) A(n) c π(n) A =, π =, π n d T a nn where A (n) is the principal matrix of A obtained by deleting the nth row and column from A. Since rank A = n, the relationship a nn = d T A (n) c holds, so that A # = (I eπt (n) )A (n) (I eπt (n) ) π n(i eπ(n) T )A (n) e. π(n) T A (n) (I eπt (n) ) π nπ(n) T A (n) e Hence π n e a in # = i TA (n) e + π nπ(n) T A (n) e, i /= n, (4.) π n π(n) T A (n) e, i = n,

46 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 where e i is the ith column of I. By M-matrix properties, it follows that A (n) > 0so that π n ei TA (n) e, π nπ(n) T A (n) e>0. Thus max a in # = i a# nn, and the result follows. (b) For each, a # > 0, by (4.). Since π T A # = 0 T, it follows that there exists k 0 such that a k # 0 < 0. Furthermore, let i 0 be such that max i a i # = a# i 0. Then a a i # 0 < i # 0 a# k 0 if a# i 0 0, a i # 0 a# otherwise. Thus, for all, max i so that a # i <max i,k a # i a# k max a i # <max max a i # i, i,k a# k. On the other hand, max max i,k a # i a# k max i, a # i +max k, a # k = max a i #. i, Now the assertion follows from (a). (c) It follows directly from the definitions of, and norm, and their relationship, (d) For a real number a, definea + max {a,0}. Then max max i,k a i # a# k =max i,k max i,k and the assertion follows by (a). (e) For any vector x R n, x n x, and the result follows, since max a i # a# k (a i # a# k )+ =τ (A # ) (by Seneta [9, p.39]),

G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 47 τ (A # )= max i,k A# i A# k, and max max i,k a # i a# k = max i,k A# i A# k, where B i denotes the ith row of a matrix B. (f) Since for any matrix B with equal row sums, and τ (B)= sup v = v T e=0 v T B, y T B y B, for any vector y R n,wehave τ (B) B. The inequalities follow by this relationship together with the following facts, proven in []: τ (A # ) = τ (Z), and if τ (P ) <, then τ (A # ) τ (P ). (g) It follows by applying the triangle inequality to Z = A # + eπ T and A # = Z eπ T. The following list summarizes the relationship between the condition numbers: Relation between condition numbers κ 8 = κ 3 κ 4 < κ 3 κ 6 κ l for l =,, 5, κ 6 nκ 3, κ κ κ +.

48 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 Remark 4.. As remarked at the beginning of this section, some of the condition numbers provide tighter bounds than the form used in this section. To be more precise, for l =,, 5, 6, π π π π κ l. E E To give a fairer comparison to these condition numbers, note that (π π) T e = 0 implies π π (/) π π so that π π κ l E, where κ l = (/)κ l,forl =,, 5, 6. The comparison of these new condtion numbers κ l with κ 3 is as given above: κ 3 κ l, for l =,, 5, 6. 5. Concluding remarks We reviewed eight existing perturbation bounds, and the condition numbers are compared. The list at the end of the last section clarifies the relationships between seven condition numbers. Among these seven condition numbers, the smallest condition number is κ 3 max (a # min i a # i ) by Haviv and van Heyde [5] and Kirkland et al. [9] or equivalently, κ 8 [ ] max maxi/= m i m by Cho and Meyer []. The only condition number not included in the comparison is κ 7 = min A (). Is κ 3 a smaller condition number than κ 7? Since by (4.), a # min i a # i = π A (), The authors thank the referee for bringing this point to our attention.

G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 49 for each, the question is whether max π A () min A () (5.) holds. In other words, is π always small enough so that max π A () is never larger than min A ()? This question leads us to the relationship between a stationary probability π and the corresponding A (). In a special case, the relationship is clear. Lemma 5.. If P is of rank, then A () = π, for all =,,...,n. The proof appears in [9, Observation 3.3]. This relationship for an arbitrary irreducible Markov chain does not hold. In general, however, A () tends to increase as π decreases, and vice versa. Notice that κ 8 is equal to κ 3. However, viewing sensitivity in terms of mean first passage times can sometimes help practitioner decide whether or not to expect sensitivity in their Markov chain models merely by observing the structure of the chain, thus obviating the need for computing or estimating condition numbers. References [] G. Cho, C. Meyer, Markov chain sensitivity measured by mean first passage times, Linear Algebra Appl. 36 (000) 8. [] R. Dobrushin, Central limit theorem for nonstationary Markov chains. I, Theory Probab. Appl. (956) 65 80. [3] R. Dobrushin, Central limit theorem for nonstationary Markov chains. II, Theory Probab. Appl. (956) 39 383. [4] R. Funderlic, C. Meyer, Sensitivity of the stationary distribution vector for an ergodic Markov chain. II, Linear Algebra Appl. 76 (986) 7. [5] M. Haviv, L. Van der Heyden, Perturbation bounds for the stationary probabilities of a finite Markov chain, Adv. Appl. Probab. 6 (984) 804 88. [6] I. Ipsen, C. Meyer, Uniform stability of Markov chains, SIAM J. Matrix Anal. Appl. 5 (994) 06 074. [7] D. Isaacson, R. Madsen, Markov Chains: Theory and Application, Wiley, New York, 976. [8] J. Kemeny, J. Snell, Finite Markov Chains, Van Nostrand, New York, 960. [9] S. Kirkland, M. Neumann, B. Shader, Applications of Paz s inequality to perturbation bounds for Markov chains, Linear Algebra Appl. 68 (998) 83 96. [0] A. Lešanovský, Coefficients of ergodicity generated by non-symmetrical vector norms, Czechoslovak Math. J. 40 (990) 84 94. [] C. Meyer, The role of the group generalized inverse in the theory of finite Markov chains, SIAM Rev. 7 (975) 443 464.

50 G.E. Cho, C.D. Meyer / Linear Algebra and its Applications 335 (00) 37 50 [] C. Meyer, The condition of a finite Markov chain and perturbation bounds for the limiting probabilities, SIAM J. Algebraic Discrete Methods (980) 73 83. [3] C. Meyer, The character of a finite Markov chain, in: C.D. Meyer, R.J. Plemmons (Eds.), Proceedings of the IMA Workshop on Linear Algebra, Markov Chains, and Queuing Models, Springer, Berlin, 993, pp. 47 58. [4] C. Meyer, Sensitivity of the stationary distribution of a Markov chain, SIAM J. Matrix Appl. 5 (994) 75 78. [5] A. Rhodius, On explicit forms for ergodic coefficients, Linear Algebra Appl. 94 (993) 7 83. [6] U. Rothblum, C. Tan, Upper bounds on the maximum modulus of subdominant eigenvlues of nonnegative matrices, Linear Algebra Appl. 66 (985) 45 86. [7] P. Schweitzer, Perturbation theory and finite Markov chains, J. Appl. Probab. 5 (968) 40 43. [8] E. Seneta, Coefficients of ergodicity: structure and applications, Adv. Appl. Probab. (979) 576 590. [9] E. Seneta, Non-negative Matrices and Markov Chains, Springer, New York, 98. [0] E. Seneta, Explicit forms for ergodicity coefficients and spectrum localization, Linear Algebra Appl. 60 (984) 87 97. [] E. Seneta, Perturbation of the stationary distribution measured by ergodicity coefficients, Adv. Appl. Probab. 0 (988) 8 30. [] E. Seneta, Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains, in: W.J. Stewart (Ed.), Numerical Solution of Markov Chains, Marcel Dekker, NY, 99, pp. 9. [3] E. Seneta, Sensitivity of finite Markov chains under perturbation, Statist. Probab. Lett. 7 (993) 63 68. [4] C. Sheppard, A. Householder, The mathematical basis of the interpretation of tracer experiments in closed steady-systems, J. Appl. Phys. (95) 50 50. [5] C. Tan, A functional form for a coefficient of ergodicity, J. Appl. Probab. 9 (98) 858 863. [6] C. Tan, Coefficients of ergodicity with respect to vector norms, J. Appl. Probab. 0 (983) 77 87. [7] R. Whittaker, Experiments with radisphosphorus tracer in aquarium microcosms, Ecological Monographs 3 (96) 57 88.