Perturbation results for nearly uncoupled Markov chains with applications to iterative methods Jesse L. Barlow December 9, 992 Abstract The standard perturbation theory for linear equations states that nearly uncoupled Markov chains(numcs) are very sensitive to small changes in the elements. Indeed, some algorithms, such as standard Gaussian elimination, will obtain poor results for such problems. A structured perturbation theory is given that shows that NUMCs usually lead to well conditioned problems. It is shown that with appropriate stopping criteria, iterative aggregation/disagregation algorithms will achieve these structured error bounds. A variant of Gaussian elimination due to Grassman, Taksar, and Heyman was recently shown by O'Cinneide to achieve such bounds. Keywords Structured error bounds, aggregation/disaggregation, eigenvector, stopping criteria. AMS(MOS) Subject Classications 65F5, 65F5,65F2,65G5. Introduction We consider the problem of solving the homogeneous system of linear equations Ap = () Department of Computer Science, The Pennsylvania State University, University Park, PA 682-63, E-mail: barlow@cs.psu.edu, Supported by the National Science Foundation under grant CCR-9526 and its renewal, grant CCR-92692. This research was done in part during the author's visit to the Institute for Mathematics and its Applications, 54 Vincent Hall, 26 Church St. S.E., University of Minnesota, Minneapolis, MN, 55455
subject to the constraint where A 2< nn is a singular M-matrix of rank n c T p = (2) c T A =:, c; p 2< n and Here c =(;;...;) T and A = I P T where P is a row stochastic matrix. Thus p =( ; 2 ;...; n ) T is a right eigenvector of P T and s a left eigenvector corresponding to the eigenvalue one. The application is that of nding the stationary distribution of a Markov chain. Our special interest in this paper is in \nearly uncoupled Markov chains (NUMC)." For the NUMC problem, the transition matrix P will have the form P T = B@ P E 2 E t E 2 P 22 E 23 E t ; P t ;t E t ;t E t E t;t P tt where all of the elements of the o-diagonal blocks E ij are small. Here each P ii is an matrix and each E ij is an m j matrix. Let be dened by = max X jt i6=j k E ij k CA and let Clearly, For convenience, let X i6=j F ij = E ij: k F ij k : and thus A = B@ B ii = I P ii ; i =;2;...;t B E 2 E t E 2 B 22 E 23 E 2t E t ; B t ;t E t ;t E t E t;t B tt CA : (3) We discuss how accurately we can expect to solve the NUMC and show how this can be applied to aggregation/disaggregation methods for these problems. 2
Stewart, Stewart, and McAllister[2,6] have shown that such methods converge very quickly for the NUMC. Quite recently, O'Cinneide[9] has shown that the variant of Gaussian elimination due to Grassman, Taksar, and Heyman[5] obtains small relative errors in the components of p for all irreducible, acyclic Markov chains, thus satisfying even better error bounds than are given in this paper, However, iterative methods can be used to solve larger problems, thus a more general perturbation theory is necessary. The conditioning aspects will be described in terms of the group inverse. For a matrix A, the group inverse A is the unique matrix such that :AA A = A; 2:A AA = A ;3:AA = A A: The group inverse exists if and only if rank(a) = rank(a 2 ) and the latter condition holds since zero is a simple eigenvalue of A. As shown by Meyer[7], it yields a more elegant characterization of the problem ()-(2) than does the Moore-Penrose inverse. The group inverse equals the Moore-Penrose inverse if and only if c = p. In [2], (see also Geurts[4])we used the fact that if ~p is a solution of A~p = r and p = ~p p then That leads to the normwise bound c T ~p =+! p = A r +!p: k p k k A k k r k +j!j: (4) Thus the error characterization is quite elegant. There is a similar, but less elegant characterization using the Moore-Penrose inverse[]. Meyer and Stewart [8] describe the strong relationship between k A k 2 and the sep() function that is commonly used to bound the error in eigenvectors[]. Funderlic and Meyer[3] were the rst to use this to characterize the condition of a Markov chain. For the NUMC problem, k A k will be O( ), thus bounds from (4) will be much too conservative. In section two, we give conditioning bounds on this problem as!. Some of the results in this section extend those of Zhang[4]. In section three, we demonstrate the relevance of our analysis to aggregation/disaggregation methods with appropriate stopping criteria. 2 Conditioning Aspects of NUMCs We discuss the condition of the asymtotic stationary distribution of the NUMC ()-(2), that is, the stationary distribution as!. 3
First, let T c =(c ;c2 T ;...;ct T ) T according to the partitioning of A. We have that T Bii = X j6=i c j T Eji = X j6=i c j T Fji : The following lemma contains a result that was also observed by Zhang[4]. Lemma 2. The diagonal blocks B ii ; i =;2;...;t have the form B ii = Bii + T c i d i where B ii is a singular M-matrix of rank such that T Bii =; and d i = X j6=i F T ij c j : Proof: Let Clearly, B ii = B ii d i T : T Bii = T B ii T d i T = T B ii d i T = by denition. Note that the o-diagonal elements of Bii are negative. Since c =(;;...;) T, Bii is diagonally dominant and thus is an M-matrix. It has rank, since if g 6= is orthogonal to c then g T Bii = g T B ii 6= since B ii is nonsingular. 2 >From Lemma 2., Bii has a unique Perron vector q i such that B ii q i = (5) T q i =: (6) We now connect this Perron vector to the solution of ()-(2). 4
Lemma 2.2 The i th block component p i 2< mi of p has the form where w i = z i Moreover, if k B ii k 4 then p i = B ii w i + i()q i k z i k, k z i k and i () is given by i () = kz i k d T i B ii w i d T : (7) i q i k i () p i q i k 4 k B ii k : Also, if k B ii k 8 and s i = p i = k p i k then k s i q i k 8 k B ii k : (8) Proof: Let z i = X j6=i F ij p j : Then the i th block equation of () is B ii p i = z i : Note that if z i = then B ii would be singular. Thus z i 6=. Wehave kz i k P j6=i kf ijp j k P j6=i k F ij k k p j k Adding an extra equation yields Bii P j6=i k p j k = kp i k <: p i = zi : Using the ( +)( + ) Householder transformation H such that : Then H Bii = B ii p T mi d i H ci = p! ; H zi = z i k z i k! p : mi k z i k 5
The system Bii p T mi d i! p i = z i k z i k! p mi k z i k is overdetermined with a zero residual. The solution p i has the form p i = B ii w i + i ()q i where B ii is any matrix such that B iibii Bii = B ii. We will suppress the argument in i (). For consistency with other arguments given here we choose B ii = B ii, the group inverse. Simply solving the last equation yields the expression (7) for i. >From the above we note that i p i q i = k z i k d T i q i T d i B ii (z i k z i k ) B ii (z i k z i k ): >From standard norm bounds k i p i q i k 2 k B ii kkz i kkd i kkq i k kz i k 2k B ii kkz i k 4 k B ii k : since k B ii k 4. Finally, we show (8). We have that k s i q i k k s i i p i k + k i p i q i k Since j i kp i k j+4k B ii k k i p i q i k j i kp i k kq i k j=j i kp i k j wehave (8)2 Note that if we use the Moore-Penrose inverse instead of the group inverse p i = B y ii (z i k z i k )+ i q i = B y ii z i + iq i since B y ii =. A similar argument with the assumption k B y ii k leads 2 to k i p i q i k 2 k B y ii k : This can be a slightly tighter bound, but the group inverse bound is more consistent with the analysis in the remainder of this paper. 6
Thus if k B ii k is bounded away from, the direction i p i should be close to the asymtotic direction q i. Therefore the condition of the direction p i is dependent upon the condition of B ii.we let B be dened by B = max jt k B jj k : We dene the vector x =(...; t ) T 2< t and aggregation matrix A as follows. Let A =( ij ) satisfy ii = d T i q i ij = T Fij q j j 6= i: Then the vector x =( ; 2 ;...; t ) T satises Ax =; c T x= (9) where c =(;;...;) T 2< t.we note that A is an M-matrix since we have Also, we have ii > ; i =;2;...;t ij ; i 6= j: c T A=; thus it is diagonally dominant. Wenow give a lemma that relates x =( ; 2 ;...; t ) T and x() =( (); 2 ();...; t ()) T. Lemma 2.3 The vector x() =( ();...; t ()) T the vector x =( ; 2 ;...; t ) T satisfy dened inlemma 2.2 and k x x() k 4 B (k A k + 2 ): Thus clearly, lim x() =x:! Proof: Again, we suppress the argument in i (). From (7) we have ii i =k z i k Also, the denition of z i gives us k z i k = X j6=i T Fij p j = X j6=i d T i B ii w i : () j T Fij q j + X j6=i T Fij B ii w i : () 7
Combining () and() yields where Ax = r (2) i = X j6=i r =( ; 2 ;...; t ) T T Fij B ii w i d i T B ii w i : Equation (2) is consistent X because () is consistent. Now to bound k r k.we have j i j kf ij k k B jj w j k T +jd i B ii w i j j6=i 2 X j6=i k F ij k k B jj k k z j k +2 k d i k k B ii k k z i k Thus 4 B =2 B [ X j6=i X kf ij k k z j k + k z i k ]: k r k 2 B [ k F ij k k z j k + i= j6=i =2 B [ k z j k X k F ij k + i= j= i6=j i= i= k z i k ] k z i k ] k z i k 4 B P t i=pj6=i k F ijp j k 4 B P t j= k p j k Pi6=j k F ij k 4 B : Now we look at the side condition on c T x.wehave c T p Thus using (2), and we have = P T i= T p i = P T i= T q i P t i= T B ii w i = c T x P t i= T B ii w i : jc T x j = P T i= i P t i= T B ii w i i= k B ii k k w i k 2 B i= k z i k jc T (x x)j2 B : (3) 8
Combining (2) and (3) with equation (4) yields k x x k k A k k r k +2 B 4 B (k A k + )2 2 We will consider the accuracy of the approximation to x rather than x. That error is simpler to characterize. It is likely that there are slightly better error bounds in terms of x, but they are harder to understand. The condition of the vectors q ; q 2 ;...;q t and the vector x are characterized separately. Consider, where A = B@ Here we let (A +A)~p=;c T ~p= (4) B F 2 F t F 2 B 22 F 23 F t ; B t ;t F t ;t F t F t;t B tt max jt k B jj k = B max X jt i6=j CA : (5) k F ij k = F : (6) Thus the perturbation is \structured" both in the sense that the o-diagonal blocks have structured perturbations. We express our main result in this section in two perturbation theorems. One for the directions q ; q 2 ;...;q t and one for x =( ; 2 ;...; t ) T. Theorem 2. Let ~p be the solution to (4). Assume maxf2; B g B 2. Then ~p =( ~ ~q ; ~ 2 ~q 2 ;...; ~ t ~q t ) T +v where k v k 4 B ( + F ) and k ~q i q i k B B ; i =;2;...;t for suitable constants ~ ; ~ 2 ;...; ~ t. Proof: From applying the analysis of equation (9) to A +Awehave that ~p =(~p ;~p 2 ;...;~p t ) T satises ~p i = ~ i ~q i + v i 9
where ~B ii ~q i = (7) c T i ~q i = (8) ~B ii =(B ii +B ii ) d i T : v i = B ~ ii ~w i X w i = ~z i k ~z i k ; ~z i = (F ij +F ij )~p j : j6=i We now need bounds on k ~q i q i k and k v k. This perturbation preserves the rank of B ~ ii since the results of Lemma 2. apply to B ~ ii.thus from a bound due to Schweitzer [], k ~ B ii k =k B ii (I +(~ B ii Bii ) B ii ) k k B ii k B 2 k B ii k 2 B We can bound k v k where v =(v ;v 2 ;...;v t ) T from Since ~w i = ~z i Therefore So we have k v i k k ~ B ii k k ~w i k 2 B k ~w i k : k ~z i k,wehavek~w i k 2kz i k,thus k v k = k v i k 4 B k ~z i k : i= k v i k 4 B i= k ~z i k : kvk 4 B P t i= [P j6=i (k F ij k + k F ij k ) k ~p j k (9) To get the bound on q i = ~q i =4 B P t j= k ~p j k [ P i6=j (k F ij k + k F ij k )] (2) =4 B ( + F ) (2) q i,we use (7)-(8) to obtain Since (7)-(8) is consistent, so is is (22)-(23) thus B ii q i =( Bii ~ Bii )~q i (22) T q i = (23) k q i k k B ii k k B ii ~ Bii k k ~q i k B B :2 We now give the perturbation theorem for x from an aggregation step.
Theorem 2.2 Assume the hypothesis and terminology of Theorem 2.. Let y be the solution of the system ~Ay = ~r; k ~r k r where ~ A =(~ij ) and c T y = ~ ij = T (F ij +F ij )~s j ~s j = ~p j k ~p j k i 6= j ~ ii = ~d T i ~s i ~d i = X j6=i(f ji +F ji ) T c j : Then k y x k [(6 +2 B ) B +2 F + r ]k A k Proof: We rst show that ~A = A +A where k A k (6 +2 B ) B +2 F (24) The main result comes right out of the perturbation theory. For i 6= j, wehave j ij j = j~ ij = ij j = jc T i (F ij +F ij )s j c T i F ijq j j = jc T i F T ijs j + Fij (s j q j )j k F ij s j k + k F ij k k s j q j k Therefore X j6=i k F ij k +(8 + B ) B k F ij k : j ij j X j6=i[kf ij k +(8 + B ) B k F ij k ] (25) For the diagonal elements, we have = F +(8+ B ) B : j ii j = j~ ii ii j = j ~ d T i ~s i d i T q i j
= j(~d i d i ) T ~s i j + jd i T (~s i q i )j k d i k k ~s i k + k d i k k ~s i q i k Thus from (25) and(26) we have (24). Now from we have with F +(8+ B ) B : (26) Ay= ~Ay = ~r Ay+~r c T y =: >From our perturbation theory, we have ky xk k A k [k A k k y k + k ~r k ] [2 F + r + (6 +2 B ) B ]k A k 2. Thus the condition of q ; q 2 ;...;q t is related solely to the condition of the diagonal blocks, whereas the condition of x appears to be bounded by the product of the condition of the diagonal blocks and that of the aggregate matrix A. Note that both conditions are independent of. The main purpose of these bounds is to give us a framework for analyzing algorithms. We now turn our attention to the accuracy of an aggregation/disaggregation algorithm. 3 Use of the above results in iterative aggregation/disaggregation methods Suppose that we have a computed solution ~p to ()-(2) such that where ~p =(~p ;~p 2 ;...;~p t ) T ~p i 2< mi A~p = r; c T ~p = (27) r =(r ;r 2 ;...;r t ) T ; r i 2< mi (28) k r i k B i =;2;...;t: (29) Moreover, assume that y =(k~p k ;k~p 2 k ;...;k ~p t k ) T =( ; 2 ;...; t ) T satises ~Ay = ~r; k r k r (3) c T y =: (3) 2
Here ~A =(~ ij ) ~ ij = c T i (F ij +F ij )~p j i 6= j ~ ii = X j6=i ~ ij i =;2;...;t and F ij i =;2;...;t;j 6= i satises the hypothesis of Theorem 2.. Then we can use our perturbation theory to insure that there is a \reasonable" solution to ()-(2). The stopping criteria (27)-(29) could be used with almost any iterative method that is coupled with an aggregation step, for instance, that in [2,6], provided that the aggregation conforms with the pattern of \near uncoupling". Note that ~p satises (A +A)~p= c T ~p= where k B ii k B. Thus ~p satises the hypothesis of Theorem 2. and y satises the hypothesis of Theorem 2.2. Equations (27)-(29) is a simple stopping criterion for an iterative method, whereas (3)-(3) can be achieve by Gaussian eliminationon ~ A[2] or, even better, the GTH variant of Gaussian elimination[5, 3,9]. Even when we use iterative methods for both the problem ()-(2) and the solution to (9), the above characterization can be used to determined stopping criteria for iterations for both problems. Thus, the results in this paper show that iterative aggregation/disaggregation methods will obtain an accurate solution to ()-(2). The directions s ; s 2 ;...;s t are stable without the aggregation step. The norms i =k ~p i k ; i =;2;...;t are stabilized by an aggregation step. Acknowledgements The author had valuable discussions with Pete Stewart and Dan Heyman during the course of this research. The Institute for Mathematics and Its Applications at the University of Minnesota provided the author with a very hospitable and stimulating environment during its special year in Applied Linear Algebra. References [] J.L. Barlow. On the smallest positive singular value of a singular M-matrix with applications to ergodic Markov chains. SIAM J. Alg. Dis. Methods, 7:44{424, 986. [2] J.L. Barlow. Error bounds and condition estimates for the computation of null vectors with applications to Markov chains. Technical Report CS-9-2, The Pennsylvania State University, Department of Computer Science, University Park, PA, 99. to appear, SIAM J. Matrix Anal. Appl. 3
[3] R.E. Funderlic and C.D. Meyer, Jr. Sensitivity of the stationary distribution for an ergodic Markov chain. Linear Alg. Appl., 76:{7, 986. [4] A.J. Geurts. A contribution to the theory of condition. Numerische Mathematik, 39:85{96, 982. [5] W.K. Grassman, M.I. Taksar, and D.P. Heyman. Regenerative analysis and steady state distribution for Markov chains. Operations Research, 33:7{ 6, 985. [6] D.F. McAllister, G.W. Stewart, and W.J. Stewart. On a Rayleigh-Ritz renement technique for nearly uncoupled stochastic matrices. Lin. Alg. Appl., 6:{25, 984. [7] C.D. Meyer, Jr. The role of the group inverse in the theory of nite Markov chains. SIAM Review, 7:443{464, 975. [8] C.D. Meyer, Jr. and G.W. Stewart. Derivatives and perturbations of eigenvectors. SIAM J. Numer. Anal., 25:679{69, 988. [9] C. A. O'Cinneide. Error analysis of a variant of Gaussian elimination for steady-state distributions of Markov chains. Technical report, Purdue University, West Lafayette, IN, 992. [] P.J. Schweitzer. Perturbation theory and nite Markov chains. Journal of Applied Probability, 5:4{43, 968. [] G.W. Stewart. Error bounds for approximate invariant subspaces for close linear operators. SIAM J. Num. Anal., 8:796{88, 97. [2] G.W. Stewart, W.J. Stewart, and D.F. McAllister. A two-stage iteration for solving nearly uncoupled Markov chains. Technical Report CSC TR- 38, Department of Computer Science, University of Maryland, College Park, MD, April 984. [3] G.W. Stewart and G. Zhang. On a direct method for the solution of nearly uncoupled Markov chains. Numerische Mathematik, 59:{, 99. [4] G. Zhang. On the sensitivity of the solution of nearly uncoupled Markov chains. Technical Report UMIACS TR 9-8, Institute for Advanced Computer Studies, Univerisity of Maryland, College Park, MD, February 99. 4