SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN*

Similar documents
Comparison of perturbation bounds for the stationary distribution of a Markov chain

Perturbation results for nearly uncoupled Markov. chains with applications to iterative methods. Jesse L. Barlow. December 9, 1992.

STOCHASTIC COMPLEMENTATION, UNCOUPLING MARKOV CHAINS, AND THE THEORY OF NEARLY REDUCIBLE SYSTEMS

The Kemeny Constant For Finite Homogeneous Ergodic Markov Chains

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination

Numerical Analysis: Solving Systems of Linear Equations

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Gaussian Elimination and Back Substitution

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Censoring Technique in Studying Block-Structured Markov Chains

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

On The Inverse Mean First Passage Matrix Problem And The Inverse M Matrix Problem

14.2 QR Factorization with Column Pivoting

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

Some bounds for the spectral radius of the Hadamard product of matrices

Solution of the Inverse Eigenvalue Problem for Certain (Anti-) Hermitian Matrices Using Newton s Method

On the Schur Complement of Diagonally Dominant Matrices

Solving Homogeneous Systems with Sub-matrices

Z-Pencils. November 20, Abstract

Intrinsic products and factorizations of matrices

Linear Algebra Primer

ON THE QR ITERATIONS OF REAL MATRICES

for an m-state homogeneous irreducible Markov chain with transition probability matrix

Scientific Computing

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

OVERCOMING INSTABILITY IN COMPUTING THE FUNDAMENTAL MATRIX FOR A MARKOV CHAIN

Practical Linear Algebra: A Geometry Toolbox

Fundamentals of Engineering Analysis (650163)

Solution of Linear Equations

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Krylov Subspace Methods to Calculate PageRank

Two Characterizations of Matrices with the Perron-Frobenius Property

Linear Algebra. Solving Linear Systems. Copyright 2005, W.R. Winfrey

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

AN ASYMPTOTIC BEHAVIOR OF QR DECOMPOSITION

Markov Chains and Stochastic Sampling

Lecture Summaries for Linear Algebra M51A

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

ACI-matrices all of whose completions have the same rank

Numerical Analysis Lecture Notes

MATH36001 Perron Frobenius Theory 2015

SOMEWHAT STOCHASTIC MATRICES

Lemma 8: Suppose the N by N matrix A has the following block upper triangular form:

Foundations of Matrix Analysis

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Geometric Mapping Properties of Semipositive Matrices

Some New Results on Lyapunov-Type Diagonal Stability

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

Review of Vectors and Matrices

b jσ(j), Keywords: Decomposable numerical range, principal character AMS Subject Classification: 15A60

Numerical Linear Algebra

Markov Chains, Stochastic Processes, and Matrix Decompositions

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

CS 246 Review of Linear Algebra 01/17/19

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Perron eigenvector of the Tsetlin matrix

A note on estimates for the spectral radius of a nonnegative matrix

c 1995 Society for Industrial and Applied Mathematics Vol. 37, No. 1, pp , March

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

EE731 Lecture Notes: Matrix Computations for Signal Processing

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Stationary Probabilities of Markov Chains with Upper Hessenberg Transition Matrices

MAT 2037 LINEAR ALGEBRA I web:

For δa E, this motivates the definition of the Bauer-Skeel condition number ([2], [3], [14], [15])

Necessary And Sufficient Conditions For Existence of the LU Factorization of an Arbitrary Matrix.

Linear Algebra: Lecture notes from Kolman and Hill 9th edition.

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

Interlacing Inequalities for Totally Nonnegative Matrices

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

Introduction to Matrices

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

a Λ q 1. Introduction

On the simultaneous diagonal stability of a pair of positive linear systems

Chapter 3. Linear and Nonlinear Systems

Frame Diagonalization of Matrices

Direct Methods for Solving Linear Systems. Matrix Factorization

On Systems of Diagonal Forms II

LINEAR ALGEBRA SUMMARY SHEET.

On the eigenvalues of specially low-rank perturbed matrices

EIGENVALUES AND EIGENVECTORS 3

Moore Penrose inverses and commuting elements of C -algebras

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

ELA THE MINIMUM-NORM LEAST-SQUARES SOLUTION OF A LINEAR SYSTEM AND SYMMETRIC RANK-ONE UPDATES

MAPPING AND PRESERVER PROPERTIES OF THE PRINCIPAL PIVOT TRANSFORM

Computational Methods. Eigenvalues and Singular Values

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Numerical Linear Algebra

Spectral Properties of Matrix Polynomials in the Max Algebra

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

CLASSIFICATION OF TREES EACH OF WHOSE ASSOCIATED ACYCLIC MATRICES WITH DISTINCT DIAGONAL ENTRIES HAS DISTINCT EIGENVALUES

The Solution of Linear Systems AX = B

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Cheat Sheet for MATH461

Transcription:

SIAM J Matrix Anal Appl c 1994 Society for Industrial and Applied Mathematics Vol 15, No 3, pp 715-728, July, 1994 001 SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN* CARL D MEYER Abstract It is well known that if the transition matrix of an irreducible Markov chain of moderate size has a subdominant eigenvalue which is close to 1, then the chain is ill conditioned in the sense that there are stationary probabilities which are sensitive to perturbations in the transition probabilities However, the converse of this statement has heretofore been unresolved The purpose of this article is to address this issue by establishing upper and lower bounds on the condition number of the chain such that the bounding terms are functions of the eigenvalues of the transition matrix Furthermore, it is demonstrated how to obtain estimates for the condition number of an irreducible chain with little or no extra computational effort over that required to compute the stationary probabilities by means of an LU or QR factorization Key words Markov chains, stationary distribution, stochastic matrix, sensitivity analysis, perturbation theory, character of a Markov chain, condition numbers AMS subject classifications 65U05, 65F35, 60J10, 60J20, 15A51, 15A12, 15A18 1 Introduction The problem under consideration is that of analyzing the effects of small perturbations to the transition probabilities of a finite, irreducible, homogeneous Markov chain More precisely, if P n n is the transition probability matrix for such a chain, and if π T (π 1, π 2,, π n is the stationary distribution vector satisfying π T P π T and n i1 π i 1, the goal is to describe the effect on π T when P is perturbed by a matrix E such that P P + E is the transition probability matrix of another irreducible Markov chain Schweitzer (1968 provided the first perturbation analysis in terms of Kemeny and Snell s fundamental matrix Z (A + eπ T 1 in which A I P and e is a column of 1 s If A # denotes the group inverse of A [Meyer (1975 or Campbell and Meyer (1991], then Z (A + eπ T 1 A # + eπ T But in virtually all applications involving Z, the term eπ T is redundant; ie, all relevant information is contained in A # In particular, if π T ( π 1, π 2,, π n is the stationary distribution for P P + E, then (11 π T π T ( I + EA # 1 and (12 π T π T E A # in which can be either the 1-, 2-, or -norm If the jth column and the (i, j- entry of A # are denoted by A # j and a# ij, respectively, then (13 π j π j E A # j *Received by the editors April 6, 1992; accepted for publication (in revised form October 30, 1992 This work was supported in part by National Science Foundation grants DMS-9020915 and DDM-8906248 North Carolina State University, Mathematics Department, Raleigh, North Carolina 27695-8205, (meyer@mathncsuedu 715

716 carl d meyer and (14 max j π j π j E max i,j This bound is about as good as possible see Ipsen and Meyer (1994 for a discussion of optimal bounds Moreover, if the transition probabilities are analytic functions of a parameter t so that P P(t, then a # ij (15 dπ T dt π T dp dt A# and dπ j dt πt dp dt A# j The results (11 and (12 are due to Meyer (1980, and (13 appears in Golub and Meyer (1986 The inequality (14 was given by Funderlic and Meyer (1986, and the formulas (15 are derived in Golub and Meyer (1986 and Meyer and Stewart (1988 Seneta (1991 established an inequality similar to (12 using the coefficient of ergodicity τ 1 (A # in place of A # These facts make it absolutely clear that the entries in A # determine the extent to which π T is sensitive to small changes in P, so, on the basis of (14, it is natural to adopt the following definition of Funderlic and Meyer (1986 Definition 11 The condition of a Markov chain with a transition matrix P is measured by the size of its condition number, which is defined to be κ max i,j where a # ij is the (i, j-entry in the group inverse A # of A I P It is an elementary fact that κ is invariant under permutations of the states of the chain For chains of moderate size, it is not difficult to show (see the proof of Theorem 21 given in 4 that if there exists a subdominant eigenvalue of P which is close to 1, then κ must be large However, the converse of this statement has heretofore been unresolved, and our purpose is to focus on this issue More precisely, we address the following question If the subdominant eigenvalues of an irreducible Markov chain are well separated from 1, can we be sure that the chain is well conditioned? In other words, do the subdominant eigenvalues of P (or equivalently, the nonzero eigenvalues of A somehow provide complete information about the sensitivity of the chain or do we really need to know something about the singular values of A? The conjecture that κ max i,j a # ij is somehow controlled by the nonzero eigenvalues of A is contrary to what is generally true a standard example is the triangular matrix (16 1 2 0 0 0 1 2 4 2 n 2 2 n 1 0 1 2 0 0 0 1 2 2 n 3 2 n 2 0 0 1 0 0 T n n, T 1 0 0 1 2 n 4 2 n 3 0 0 0 1 2 0 0 0 1 2 0 0 0 0 1 0 0 0 0 1 a # ij

sensitivity of markov chains 717 for which max i,j [T 1 ] ij is immense for even moderate values of n, but the eigenvalues of T provide no clue whatsoever that this occurs The fact that the eigenvalues are repeated or that T is nonsingular is irrelevant consider a small perturbation of T or the matrices ( 0 0 T 0 T and T# ( 0 0 0 T 1 We will prove that, unlike the situation illustrated above, irreducible stochastic matrices P possess enough structure to guarantee that growth of the entries in A # is controlled by the nonzero eigenvalues of A I P As a consequence, it will follow that the sensitivity of an irreducible Markov chain is governed by the location of its subdominant eigenvalues 2 The main result In the sequel, it is convenient to adopt the following terminology and notation Definition 21 Let P be the transition probability matrix of an n -state irreducible Markov chain, and let σ(p {1, λ 2, λ 3,, λ n } denote the eigenvalues of P The character 1 of the chain is defined to be the (necessarily real number (1 λ 2 (1 λ 3 (1 λ n It will follow from later developments that (21 0 < n A chain is said to be of weak character when is close to 0, and the chain is said to have a strong character when is significantly larger than 0 If ( P T 1 1 0 T 0 C (eg, this may be the reduction to Jordan form where the spectral radius of C is less than 1, then ( ( A T 1 0 0 T and A # T 1 0 0 0 I C 0 (I C 1 T [Campbell and Meyer (1991], so det (I C and 1 det (I C 1 In other words, and 1 are the respective determinants of the nonsingular parts of A and A # in the sense that det (A /R(A and 1 det (A # / R(A where A /R(A denotes the linear operator defined by restricting A to R (A It is also true that 1 det (Z where Z is Kemeny and Snell s fundamental matrix The main result of this paper is the following theorem which establishes the connection between the condition of an irreducible chain and its character 1 The character was defined by Meyer (1993 to be n 1 (1 λ 2 (1 λ 3 (1 λ n, which is the normalization of the definition given here

718 carl d meyer Theorem 21 For an irreducible stochastic matrix P n n, let A I P, and for i j, let δ ij (A denote the deleted product of diagonal entries δ ij (A a kk (1 p kk k i,j k i,j If δ max i,j δ ij (A (the product of all but the two smallest diagonal entries, then the condition number κ is bounded by (22 1 n min 1 λ i κ < 2δ(n 1 λ i 1 2(n 1 The proof of this theorem depends on exploiting the rich structure of A, some of which is apparent, and some of which requires illumination Before giving a formal argument, it is necessary to detail the various components of this structure, so the important facets are first laid out in 3 as a sequence of lemmas After the necessary framework is in place, it will be a simple matter to connect the lemmas together in order to construct a proof; this is contained in 4 By combining Theorem 21 with (14 and the other facts listed in 1, we arrive at the following conclusion Theorem 22 The condition of an irreducible Markov chain is primarily governed by how close the subdominant eigenvalues of the chain are to 1 More precisely, if an irreducible chain is well conditioned, then all subdominant eigenvalues must be well separated from 1, and if all subdominant eigenvalues are well separated from 1 in the sense that the chain has a strong character, then it must be well conditioned It is a corollary of Theorem 21 that if max λi 1 λ i << 1, then the chain is not overly sensitive, but it is important to underscore the point that the issue of sensitivity is not equivalent to the question of how close max λi 1 λ i is to 1 Knowing that some λ i 1 is not sufficient to guarantee that the chain is sensitive; eg, consider the well-conditioned periodic chain (or any small perturbation thereof for which P 0 0 1 1 0 0 and A # 1 1 1 0 0 1 1 3 0 1 0 1 0 1 3 The underlying structure The purpose of this section is to organize relevant properties of A I P into a sequence of lemmas from which the formal proof of Theorem 21 can be constructed Some of the more transparent or well-known features of A are stated in the first lemma Lemma 31 If A I P where P n n is an irreducible stochastic matrix, then the following statements are true (31 A as well as each principal submatrix of A has strictly positive diagonal entries, and the off-diagonal entries are nonpositive (32 A is a singular M-matrix of rank n 1 (33 If B k k (k < n is a principal submatrix of A, then each of the following statements is true (a B is a nonsingular M-matrix (b B 1 0 (c det (B > 0 (d B is diagonally dominant (e det (B b 11 b 22 b kk 1

sensitivity of markov chains 719 Proof These facts are either self-evident, or they are direct consequences of wellknown results see Berman and Plemmons (1979 or Horn and Johnson (1991 Part of the less transparent structure of A is illuminated in the following sequence of lemmas Lemma 32 If P n n is an irreducible stochastic matrix, and if A i denotes the principal submatrix of A I P obtained by deleting the ith row and column from A, then n det (A i i1 Proof Suppose that the eigenvalues of A are denoted by {µ 1, µ 2,, µ n }, and write the characteristic equation for A as x n + α n 1 x n 1 + + α 1 x + α 0 0 Each coefficient α n k is given by ( 1 k times the sum of the product of the eigenvalues of A taken k at a time That is, (34 α n k ( 1 k µ i1 µ i2 µ ik 1 i 1< <i k n But it is also a standard result from elementary matrix theory that each coefficient α n k can be described as α n k ( 1 k (all k k principal minors of A Since 0 is a simple eigenvalue for A, there is only one nonzero term in the sum (34 when k n 1, and hence Therefore, α 1 ( 1 n 1 µ 2 µ 3 µ n ( 1 n 1 (1 λ 2 (1 λ 3 (1 λ n n ( 1 n 1 det (A i i1 n det (A i i1 n (1 λ k k2 Lemma 33 If A i denotes the principal submatrix of A I P obtained by deleting the ith row and column from A, and if π i is the ith stationary probability, then the character of the chain is given by det (A i π i Proof This result follows directly from Lemma 32 and the fact that the stationary distribution π T is given by the formula π T 1 ( n i1 det (A det (A 1, det (A 2,, det (A n i [Golub and Meyer (1986 or Iosifescu (1980, p 123] The mean return time for the kth state is R k 1/π k [Kemeny and Snell (1960], and, since not all of the π k s can be less than 1/n, there must exist a state such that R k n By combining this with (33c and (33e, an interesting corollary which proves (21 is produced

720 carl d meyer Corollary 31 If R k denotes the mean return time for the kth state then 0 < det (A i < min k R k n for each i 1,2,, n Lemma 34 If A I P where P n n is an irreducible stochastic matrix, and if B k k (k < n is a principal submatrix of A, then the largest entry in each column of B 1 is the diagonal entry That is, for j 1, 2,, k, it must be the case that [B 1 ] jj [B 1 ] ij for each i j At least two different proofs are possible, and we shall give both because each is instructive in its own right The first argument is shorter and more probabilistic, but it rests on a result which requires a proof of its own The second argument involves more algebraic details, but it is entirely self-contained and depends only on elementary concepts Probabilistic proof Without loss of generality, assume that B is the leading k k principal submatrix of A so that P has the form ( I B P Consider any pair of states i and j in the set S {1,2,, k}, and let N j denote the number of times the process is in state j before first hitting a state in the complement S {k + 1, k + 2,, n} If X n denotes the state of the process after n steps, and if then h ij P(hitting state j before entering S X 0 i, (35 E[N j X 0 i] d ij + h ij E[N j X 0 j] where d ij { 1 if i j, 0 if i j This statement (which appears without proof on p 62 in Kemeny and Snell (1960 is intuitive, but it is not trivial The theory of absorbing chains says that [B 1 ] ij E[N j X 0 i], so for i j we have [B 1 ] ij h ij [B 1 ] jj [B 1 ] jj Algebraic proof Assume that B is the leading k k principal submatrix of A, and suppose the states have been arranged so that the jth state is listed first and the ith state is listed second The goal is to prove that [B 1 ] 11 [B 1 ] 21 Because [B 1 ] 11 det (B 11 det (B and [B 1 ] 21 det (B 12 det (B where B ij denotes the submatrix of B obtained by deleting the ith row and jth column from B and because Lemma 31 guarantees that det (B > 0, it suffices to prove that det (B 11 + det (B 12 0

sensitivity of markov chains 721 Denote the first unit vector by e T 1 (1,0,,0, and partition B as (36 1 p 11 p 12 p 1k ( p B 21 1 p 22 p 2k 1 p11 p 12 p 1k b 1 b 2 b k p k1 p k2 1 p kk In terms of these quantities, det (B 11 + det (B 12 is given by det (B 11 + det (B 12 det ( ( b 2 b 3 b k + det b1 b 3 b k det ( b 2 + b 1 b 3 b k det ( B 11 + b 1 e T 1 det (B 11 ( 1 + e T 1 B 1 11 b 1 Lemma 31 also insures that det (B 11 > 0, so the proof can be completed by arguing that 1 + e T 1 B 1 11 b 1 0 To do so, modify the chain by making state 1 as well as states k + 1, k + 2,, n absorbing states so that the transition matrix has the form 1 0 0 0 0 0 p 21 p 22 p 23 p 2k p 2,k+1 p 2n p 31 p 32 p 33 p 3k p 3,k+1 p 3n P p k1 p k2 p k3 p kk p k,k+1 p kn 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 b 1 Q R 0 0 I n k It follows from the elementary theory of absorbing chains that the entries in the matrix (I Q 1( b 1 R B 1 ( 11 b1 R represent the various absorption probabilities, and consequently all entries in B 1 11 b 1 are between 0 and 1 so that 0 1 + e T 1 B 1 11 b 1 1 Note Although it may not be of optimal efficiency, the algebraic argument given above is also a proof of the statement (35 Lemma 35 If A I P where P n n is an irreducible stochastic matrix, and if B k k (k < n is a principal submatrix of A, then 0 < det (B max i δ i (B max i,j [B 1 ] ij 1 max i,j [B 1 ] ij where δ r (B denotes the deleted product δ r (B b 11 b 22 b kk /b rr

722 carl d meyer Proof Lemma 34 insures that there is some diagonal entry [ B 1] rr such that of B 1 (37 [ B 1 ] max [ rr B 1 ] i,j ij If B rr is the principal submatrix of B obtained by deleting the rth row and column from B, then (33e together with (37 produces det (B det (B rr [B 1 ] rr δ r(b [B 1 ] rr δ r (B max i,j [B 1 ] ij max i δ i (B 1 max i,j [B 1 ] ij max i,j [B 1 ] ij Lemma 36 For an irreducible stochastic matrix P n n, let A j be the principal submatrix of A I P obtained by deleting the jth row and column from A, and let Q be the permutation matrix such that ( Aj Q T c j AQ d T j a jj If the stationary distribution for Q T PQ is written as ψ T π T Q (π T, π j, then the group inverse of A is given by ( (I eπ T A 1 A # j (I eπ T π j (I eπ T A 1 j e Q π T A 1 j (I eπ T π j π T A 1 j e where e is a column of 1 s whose size is determined by the context in which it appears Proof The group inverse possesses the property that (T 1 AT # T 1 A # T for all nonsingular matrices T [Campbell and Meyer (1991], so Q T ( # Aj A # c j Q Q T d T j a jj Since rank ( Q T AQ n 1, it follows that a jj d T j A 1 j c j 0, and this is used to verify that ( # ( Aj c j A 1 (I eψ T d T j a jj j 0 0 0 (I eψ T ( (I eπ T A 1 j (I eπ T π j (I eπ T A 1 j e π T A 1 j (I eπ T π j π T A 1 j e

sensitivity of markov chains 723 4 Proof of the main theorem The preceding sequence of lemmas are now connected together to prove the primary results stated in Theorem 21 The upper bound To derive the inequalities (41 max i,j a # ij < 2δ(n 1 2(n 1, begin by letting Q be the permutation matrix given in Lemma 36 so that for i j, the (i, j-entry of A # is the (k, n-entry of Q T A # Q where k n In succession, use the formula of Lemma 36 and Hölder s inequality followed by the results of Lemmas 35 and 33 to write a # ij π j e T k (I eπ T A 1 j e π j e k π 1 A 1 j e < 2π j A 1 j 2π j(n 1 max r,s 2π j(n 1 max i δ i (A j det (A j 2δ(n 1 2(n 1 [ ] A 1 j rs 2π j(n 1δ det (A j Now consider the diagonal elements The (j, j-entry of A # is the (n, n-entry of Q T A # Q, so proceeding in a manner similar to that above produces a # jj π j π T A 1 j e π j π 1 A 1 j e < π j A 1 j π j(n 1 max r,s π j(n 1 max i δ i (A j det (A j δ(n 1 thus proving (41 The lower bound To establish that (n 1, [ ] A 1 j rs π j(n 1δ det (A j (42 1 n min 1 λ i max a # ij, i,j λ i 1 make use of the fact that if Ax µx for µ 0, then A # x µ 1 x [Campbell and Meyer (1991, p 129] In particular, if λ 1 is an eigenvalue of P, and if x is a corresponding eigenvector, then Ax (1 λx implies that A # x (1 λ 1 x, so 1 1 λ A # n max i,j a # ij

724 carl d meyer 5 Using an LU factorization Except for chains which are too large to fit into a computer s main memory, the stationary distribution π T is generally computed by direct methods; ie, either an LU or QR factorization of A I P (or A T is computed [Harrod and Plemmons (1984; Grassmann, Taksar, and Heyman (1985; Funderlic and Meyer (1986; Golub and Meyer (1986; Barlow (1993] Even for very large chains which are nearly uncoupled, direct methods are usually involved they can be the basis of the main algorithm [Stewart and Zhang (1991], or they can be used to solve the aggregated and coupling chains in iterative aggregation/disaggregation algorithms [Chatelin and Miranker (1982, Haviv (1987] In the conclusion of their paper, Golub and Meyer (1986 make the following observation Computational experience suggests that when a triangular factorization of A n n is used to solve an irreducible chain, the condition of the chain seems to be a function of the size of the nonzero pivots, and this means that it should be possible to estimate κ with little or no extra cost beyond that incurred in computing π T For large chains, this can be a significant savings over the O(n 2 operations demanded by traditional condition estimators Of course, this is contrary to the situation which exists for general nonsingular matrices because the absence of small pivots (or the existence of a large determinant is not a guarantee of a well-conditioned matrix consider the matrix in (16 A mathematical formulation and proof (or even an intuitive explanation of Golub and Meyer s observation has heretofore not been given, but the results of 2 and 3 now make it possible to give a more precise statement and a rigorous proof of the Golub Meyer observation The arguments hinge on the fact that whenever π T is computed by means of a triangular factorization of A (or A T, the character of the chain is always an immediate by-product The results for an LU factorization are given below, and the analogous theory for a QR factorization is given in the next section Suppose that the LU factorization 2 of A I P is computed to be ( ( Ln 0 Un c A LU r T 1 0 0 If A n is the principal submatrix of A obtained by deleting the last row and column from A, then A n is a nonsingular M-matrix, and its LU factorization is A n L n U n Since the LU factors of a nonsingular M-matrix are also nonsingular M-matrices [Berman and Plemmons (1979, Horn and Johnson (1991], it follows that L n and U n are nonsingular M-matrices, and hence L 1 n 0 and U 1 n 0 Consequently, r T 0, so the solution (obtained by a simple substitution process with no divisions of the nonsingular triangular system x T L n r T is nonnegative This together with the result of Lemma 33 and Theorem 21 produces the following conclusion Theorem 51 For an irreducible Markov chain whose transition matrix is P, let the LU factorization of A I P be given by ( ( Ln 0 Un c A LU r T 1 0 0 2 Regardless of whether A or A T is used, Gaussian elimination with finite-precision arithmetic can prematurely produce a zero (or even a negative pivot, and this can happen for wellconditioned chains Practical implementation demands a strategy to deal with this situation, and Funderlic and Meyer (1986 and Stewart and Zhang (1991 discuss this problem along with possible remedies Practical algorithms involve reordering schemes which introduce permutation matrices, but these permutations are not important in the context of this section, so they are suppressed

sensitivity of markov chains 725 If x T is the solution of x T L n r T, then each of the following statements is true The stationary distribution of the chain is (51 π T 1 1 + x 1 (x T, 1 The character of the chain is (52 det (U n π n (1 + x 1 det (U n The condition number for the chain is bounded above by (53 κ < 2δ(n 1π n det (U n 2δ(n 1 (1 + x 1 det (U n 2(n 1 (1 + x 1 det (U n The condition number for the chain is bounded below by n 1 (54 π n i1 π i u ii n 1 1 x i (1 + x 1 2 κ u i1 ii where u ii is the ith pivot in U n Proof Statements (51, (52, and (53 are straightforward consequences of the previous discussion To establish (54, first recall from Lemma 36 that Since U 1 n 0 and L 1 n a # nn π n π T A 1 n ( π T U 1 π1 n, u 11 e π n π T U 1 n L 1 n e > 0 0, it follows that π T U 1 n π 2 u 22 + α 2,, L 1 n e (1, 1 + β 2,, 1 + β n 1 T and L 1 n e can be written as π n 1 + α n 1, u n 1,n 1 where each α i and β i is nonnegative, and consequently (setting α 0 β 0 0 Therefore, π T A 1 n κ a # nn π n π T U 1 n n 1 e π T U 1 n L 1 (π i + α i (1 + β i n e u i1 ii n 1 L 1 π i n e π n u i1 ii n 1 i1 π i u ii n 1 1 x i (1 + x 1 2 u i1 ii As mentioned before, the pivots or the determinant need not be indicators of the condition of a general nonsingular matrix In particular, the absence of small pivots (or the existence of a large determinant is not a guarantee of a well-conditioned matrix However, for our special matrices A I P, the bounds in Theorem 51 allow the pivots to be used as condition estimators

726 carl d meyer Corollary 51 For an irreducible Markov chain whose transition matrix is P, suppose that the LU factorization of A I P and the stationary distribution π T have been computed as described in Theorem 51 If the pivots u ii are large relative to π n in the sense that π n /det (U n is not too small, then the chain is well conditioned If there are pivots u ii which are small relative to π n π i in the sense that π i /u ii π n n 1 is large, then the chain is ill conditioned i1 6 Using a QR factorization The utility of orthogonal triangularization is well documented in the vast literature on matrix computations, and the use of a QR factorization to solve and analyze Markov chains is discussed by Golub and Meyer (1986 The following theorem shows that the character of an irreducible chain can be directly obtained from the diagonal entries of R and the last column of Q, and this will establish an upper bound using a QR factorization which is analogous to that in Theorem 51 for an LU factorization A lower bound analogous to the one in Theorem 51 is not readily available Theorem 61 For an irreducible Markov chain whose transition matrix is P, the QR factorization of A I P is given by ( ( ( Qn c Rn R n e Qn R n Q n R n e A QR d T 0 0 d T R n d T R n e q nn If q denotes the last column of Q, then each of the following statements are true The stationary distribution of the chain is (61 π T The character of the chain is q T n i1 q in (62 q 1 det (R n The condition number for the chain is bounded above by (63 κ < 2δ(n 1 q 1 det (R n 2(n 1 q 1 det (R n Proof The formula (61 for π T is derived in Golub and Meyer (1986 To prove (62, first recall the result of Lemma 33, and observe that ( 2 2 detan (detq nr n 2 π n π 2 n (detq n 2 (detr n 2 qnn/ 2 q 2 1 Use the fact that QQ T I implies Q n Q T n + cc T I to obtain (detq 2 det ( Q n Q T n det ( I cc T 1 c T c q 2 nn, and substitute this into the previous expression to obtain (62 The bound (63 is now a consequence of the result of Theorem 21

sensitivity of markov chains 727 7 Concluding remarks It has been argued that the sensitivity of an irreducible chain is primarily governed by how close the subdominant eigenvalues are to 1 in the sense that the condition number of the chain is bounded by (71 1 n min λ i 1 1 λ i κ < 2δ(n 1 Although the upper bound explicitly involves n, it is generally not the case that 2δ(n 1/ grows in proportion to n Except in the special case when the diagonal entries of P are 0, the term δ somewhat mitigates the presence of n because as n becomes larger, δ becomes smaller Computational experience suggests that 2δ(n 1/ is usually a rather conservative estimate of κ, and the term δ/ by itself, although not always an upper bound for κ, is often of the same order of magnitude as κ However, there exist pathological cases for which even δ/ severely overestimates κ This seems to occur for chains which are not too badly conditioned and no single eigenvalue is extremely close to 1, but enough eigenvalues are within range of 1 to force 1 to be too large This suggests that for the purposes of bounding κ above, perhaps not all of the subdominant eigenvalues need to be taken into account In a forthcoming article, Seneta (1993 addresses this issue by an analysis involving coefficients of ergodicity When direct methods are used to solve an irreducible chain, standard condition estimators can be used to produce reliable estimates for κ, but the cost of doing so is O(n 2 operations beyond the solution process The results of Theorems 51 and 61 make it possible to estimate κ with the same computations which produce π T Although the bounds for κ produced by Theorem 51 are sometimes rather loose, they are nevertheless virtually free One must balance the cost of obtaining condition estimates against the information one desires to obtain from these estimates 8 Acknowledgments The exposition of this article was enhanced by suggestions provided by Dianne O Leary, Guy Latouche, and Paul Schweitzer REFERENCES J L Barlow (1993, Error bounds for the computation of null vectors with applications to Markov chains, SIAM J Matrix Anal Appl, 14, pp 598 618 A Berman and R J Plemmons (1979, Nonnegative Matrices in the Mathematical Sciences, Academic Press, New York S L Campbell and C D Meyer (1991, Generalized Inverses of Linear Transformations, Dover Publications (1979 edition by Pitman Pub Ltd, London, New York F Chatelin and W L Miranker (1982, Acceleration by aggregation of successive approximation methods, Linear Algebra Appl, 43, pp 17 47 R E Funderlic and C D Meyer (1986, Sensitivity of the stationary distribution vector for an ergodic Markov chain, Linear Algebra Appl, 76, pp 1 17 G H Golub and C D Meyer (1986, Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains, SIAM J Algebraic Discrete Meth, 7, pp 273 281 W K Grassmann, M I Taksar, and D P Heyman (1985, Regenerative analysis and steady state distributions for Markov chains, Oper Res, 33, pp 1107 1116 W J Harrod and R J Plemmons (1984, Comparison of some direct methods for computing stationary distributions of Markov chains, SIAM J Sci Statist Comput, 5, pp 453 469 M Haviv (1987, Aggregation/disaggregation methods for computing the stationary distribution of a Markov chain, SIAM J Numer Anal, 22, pp 952 966 R A Horn and C R Johnson (1991, Topics In Matrix Analysis, Cambridge University Press, Cambridge

728 carl d meyer M Iosifescu (1980, Finite Markov Processes and their Applications, John Wiley and Sons, New York I C F Ipsen and C D Meyer (1994, Uniform stability of Markov chains, SIAM J Matrix Anal Appl, 15, pp 1061 1074 J G Kemeny and J L Snell (1960, Finite Markov Chains, D Van Nostrand, New York C D Meyer (1975, The role of the group generalized inverse in the theory of finite Markov chains, SIAM Rev, 17, pp 443 464 (1980, The condition of a finite Markov chain and perturbation bounds for the limiting probabilities, SIAM J Algebraic Discrete Meth, 1, pp 273 283 (1993, The character of a finite Markov chain, in Linear Algebra, Markov Chains, and Queueing Models, C D Meyer and R J Plemmons, eds, IMA Volumes in Mathematics and its Applications, Vol 48, Springer-Verlag, New York, pp 47 58 C D Meyer and G W Stewart (1988, Derivatives and perturbations of eigenvectors, SIAM J Numer Anal, 25, pp 679 691 P J Schweitzer (1968, Perturbation theory and finite Markov chains, J Appl Probab, 5, pp 401 413 E Seneta (1991, Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains, in Numerical Solution of Markov Chains, W J Stewart, ed, Probability: Pure and Applied, No 8, Marcel Dekker, New York, pp 121 129 (1993, Sensitivity of finite Markov chains under perturbation, Statist and Probab Lett, 17, to appear G W Stewart and G Zhang (1991, On a direct method for the solution of nearly uncoupled Markov chains, Numer Math, 59, pp 1 11