Institute for Advanced Computer Studies. Department of Computer Science. On Markov Chains with Sluggish Transients. G. W. Stewart y.

Similar documents
Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of

Institute for Advanced Computer Studies. Department of Computer Science. On the Perturbation of. LU and Cholesky Factors. G. W.

UMIACS-TR July CS-TR 2721 Revised March Perturbation Theory for. Rectangular Matrix Pencils. G. W. Stewart.

Institute for Advanced Computer Studies. Department of Computer Science. On the Convergence of. Multipoint Iterations. G. W. Stewart y.

22.4. Numerical Determination of Eigenvalues and Eigenvectors. Introduction. Prerequisites. Learning Outcomes

Perturbation results for nearly uncoupled Markov. chains with applications to iterative methods. Jesse L. Barlow. December 9, 1992.

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Institute for Advanced Computer Studies. Department of Computer Science. On the Adjoint Matrix. G. W. Stewart y ABSTRACT

Markov Chains and Stochastic Sampling

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

Markov Chains and Spectral Clustering

UMIACS-TR July CS-TR 2494 Revised January An Updating Algorithm for. Subspace Tracking. G. W. Stewart. abstract

Google Page Rank Project Linear Algebra Summer 2012

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Markov Chains, Stochastic Processes, and Matrix Decompositions

Numerical Analysis Lecture Notes

1. The Polar Decomposition

Calculating Web Page Authority Using the PageRank Algorithm

EXAM. Exam 1. Math 5316, Fall December 2, 2012

A Residual Inverse Power Method

6.842 Randomness and Computation March 3, Lecture 8

This appendix provides a very basic introduction to linear algebra concepts.

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Introduction to Machine Learning CMU-10701

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

On asymptotic behavior of a finite Markov chain

Perron Frobenius Theory

Institute for Advanced Computer Studies. Department of Computer Science. Iterative methods for solving Ax = b. GMRES/FOM versus QMR/BiCG

15 Singular Value Decomposition

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

LINEAR ALGEBRA KNOWLEDGE SURVEY

lecture 2 and 3: algorithms for linear algebra

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding

Lecture 10: Powers of Matrices, Difference Equations

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

MATH2071: LAB #5: Norms, Errors and Condition Numbers

The Second Eigenvalue of the Google Matrix

PRIME GENERATING LUCAS SEQUENCES

A TOUR OF LINEAR ALGEBRA FOR JDEP 384H

Linearly-solvable Markov decision problems

Updating PageRank. Amy Langville Carl Meyer

Iterative Methods for Solving A x = b

Linear Algebra: Characteristic Value Problem

Development of an algorithm for the problem of the least-squares method: Preliminary Numerical Experience

Cheat Sheet for MATH461

G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE

Statistics 992 Continuous-time Markov Chains Spring 2004

1 Principal component analysis and dimensional reduction

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Markov Processes Hamid R. Rabiee

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

Population Games and Evolutionary Dynamics

Centro de Processamento de Dados, Universidade Federal do Rio Grande do Sul,

ELEC633: Graphical Models

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

A hybrid reordered Arnoldi method to accelerate PageRank computations

Detailed Proof of The PerronFrobenius Theorem

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

MA2501 Numerical Methods Spring 2015

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable

18.06 Professor Edelman Quiz 3 December 5, 2011

Convex Optimization CMU-10725

lecture 3 and 4: algorithms for linear algebra

Math 307 Learning Goals. March 23, 2010

NP-hardness of the stable matrix in unit interval family problem in discrete time

In this section again we shall assume that the matrix A is m m, real and symmetric.

Numerical integration and importance sampling

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Finite-Horizon Statistics for Markov chains

1. Diagonalize the matrix A if possible, that is, find an invertible matrix P and a diagonal

= a. a = Let us now study what is a? c ( a A a )

STOCHASTIC COMPLEMENTATION, UNCOUPLING MARKOV CHAINS, AND THE THEORY OF NEARLY REDUCIBLE SYSTEMS

Math 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

TMA Calculus 3. Lecture 21, April 3. Toke Meier Carlsen Norwegian University of Science and Technology Spring 2013

E = UV W (9.1) = I Q > V W

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Fiedler s Theorems on Nodal Domains

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

University of British Columbia Math 307, Final

OUTLINE 1. Introduction 1.1 Notation 1.2 Special matrices 2. Gaussian Elimination 2.1 Vector and matrix norms 2.2 Finite precision arithmetic 2.3 Fact

MATH36001 Perron Frobenius Theory 2015

Example Linear Algebra Competency Test

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG. Lehrstuhl für Informatik 10 (Systemsimulation)

UCSD ECE269 Handout #18 Prof. Young-Han Kim Monday, March 19, Final Examination (Total: 130 points)

Markov Chain Monte Carlo

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Lecture 15, 16: Diagonalization

A Detailed Look at a Discrete Randomw Walk with Spatially Dependent Moments and Its Continuum Limit

Linear Algebra, part 3 QR and SVD

Transcription:

University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{94{77 TR{3306 On Markov Chains with Sluggish Transients G. W. Stewart y June, 994 ABSTRACT In this note it is shown how to construct a Markov chain whose subdominant eigenvalue does not predict the decay of its transient. This report is available by anonymous ftp from thales.cs.umd.edu in the directory pub/reports. y Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742. This work was supported in part by the National Science Foundation under grant CCR 95568.

On Markov Chains with Sluggish Transients G. W. Stewart ABSTRACT In this note it is shown how to construct a Markov chain whose subdominant eigenvalue does not predict the decay of its transient.. Introduction Let P be the transition matrix of an nite, irreducible, aperiodic Markov chain, and let e be the vector whose components all are one. Since the row sums of P are one, Pe = e; i.e., e is a right eigenvector of P corresponding to the eigenvalue one. From the theory of nonnegative matrices, this eigenvalue is simple and is strictly larger in magnitude than the remaining eigenvalues. Moreover, there is a unique, positive, left eigenvector T satisfying T P = and T e =. The vector T is the steady-state vector for the chain whose transition matrix is P. If we set T = P e T ; then the eigenvalues of P and T are the same, except that T has an eigenvalue of zero corresponding to the eigenvalue one of P. Now it is easily veried that P k = e T + T k : Since the eigenvalues of T are less than one in magnitude, lim k! T k = 0, or equivalently lim = e T : k! Pk In other words, the powers of the transition matrix converge to a matrix whose rows are the steady-state vector. These facts are well known. In this note we will be concerned with the rate of convergence of the powers of P. The folklore says that convergence is proportional k, where is the magnitude of the largest eigenvalue of the transient matrix T. A justication for this rule of thumb is the equation lim k! ktk k k = ; which holds for any consistent matrix norm. In plain words, if we average the rate of convergence over k iterations, then with increasing k the average approaches.

2 Sluggish Markov Chains Unfortunately, this asymptotic result does not tell us what happens in the short run. And even in the long run, it is possible for kt k k= k to diverge to innity. It might be conjectured that the transition matrices of Markov chains are so structured that pathological cases cannot occur. As it turns out, certain pathologies do seem to be excluded. But, as we shall see, a Markov chain can have a sluggish transient that falls behind the the decay of its subdominant eigenvalue. Since our purpose is to produce counter examples, it is not necessary to analyze the problem in full generality. Consequently, in the next section we analyze the behavior of the powers of a 2 2 Jordan block. In x3 we show how to construct a Markov chain with a sluggish transient. The note concludes with some general observations. Throughout the paper kak will denote the spectral norm of A; that is, the square root of the largest eigenvalue of A T A. 2. Two by Two Jordan Blocks The advantage of considering a 2 2 Jordan block is that it exhibits pathologies found in larger blocks yet is simple enough to analyze. We will write our block, somewhat unusually, in the form! J = ; 0 where 0 < and > 0. It is easy to see that J k = k k 0! : (2.) The rst conclusion we can draw from this expression is that for large k kj k+ k kj k k kj k k = k k : (2.2) Thus the ratio kj k k= k approaches innity, a possibility alluded to in the introduction. From (2.2) it follows that for k large enough = + : (2.3) k Thus the local rate of convergence is + k. At rst glance, it might seem that the factor + k, which approaches one, represents an insignicant perturbation of. However, when is near one, it is important in two respects.

Sluggish Markov Chains 3 First, the analysis is asymptotic, and for the factor to represent a decrease we must have k > : (2.4) For example if = 0:99, then k will have to be greater than 99. We will return to this point later. Second, even when k satises (2.4), the small change introduced by the factor + has an inordinate eect on the number of iteration required to reduce k. k kjk To see why, note that if a process is converging to zero as k, then for > N = log (2.5) is the number of iterations required for a reduction by a factor of. Let us call N the -fold reduction time. Now if =, then from (2.3) and (2.5) [with generous use of the approximation ln( + ) = + O( 2 )] we nd that the local e-fold reduction time is N e = k k : Here is a table of the local e-fold reduction time for = 0:0. k N e 200 200 300 50 400 33 500 25 In this case, the local e-fold reduction time converges very slowly in fact harmonically to its asymptotic value of 00. Up to this point we have considered the asymptotic behavior of kj k k with k restricted to satisfy (2.4). The reason for the restriction is that for small k the element k k of J k can increase. But whether this increase induces a corresponding increase in kj k k depends on the size of. If is large enough, kj k k will show an initial increase and then turn around and begins to decrease at the local rate given by (2.3). On the other hand, when is small, kjk decreases monotonically, perhaps at rst at a rate near that of k, but asymptotically at the local rate. It is worth while to determine the critical value of that separates the two regimes. It follows directly from the denition of the spectral norm that 0! 2 = + + p 4 + 2 2! = + + O( 2 ): (2.6)

4 Sluggish Markov Chains Figure 2.: Modes of decay: = 0:99, = 0:035; 0:02; 0:005; 0 Moreover, if =, where is small, then for small k the square of k 2k = 2k. Setting = k in (2.6), we have from (2.) that is It follows that if 2k + k = 0 or kj k k 2 = ( 2k)( + k) = 2k + k: = 2 = 2( ); then initially the norms kjk k are approximately stationary. Figure 2. plots log 0 (kjk k ) for = 0:99 and for supercritical (0.035), critical (0.02), subcritical (0.005), and pure (0.00). Note the very dierent behaviors.

Sluggish Markov Chains 5 3. A Sluggish Markov Chain In this section, we will show how to imbed a 2 2 Jordan block in a 3 3 Markov chain. The algorithm goes as follows.. Choose and and generate the matrix D = 0 B @ 0 0 0 0 C A : 0 0 2. Generate a random positive vector T, normalized so that T e =. 3. Generate a random 3 2 matrix U such that U T U = I and U T e = 0. 4. Set W T = ( U). 5. Calculate Q = W DW. 6. If Q does not have positive diagonal elements and negative odiagonal elements, go to step 2. 7. Set P = I Q. Here are some comments on the algorithm. For details consult the matlab implementation in the appendix to this paper. To generate U, a random normal matrix 3 2 is generated and its columns are orthogonalized against e and each other and then normalized. The QR decomposition can be used to accomplish the orthonormalization. The rst column of W is e. For by the denition of the inverse, it must satisfy T w ( ) = and U T w ( ) = 0. By the construction of and U, e is the only vector satisfying these equations. Since WQW = D, the rst row T of W is a left eigenvector with eigenvalue zero and the rst column e of W is a right eigenvector with eigenvalue zero. It is necessary to check the sign pattern of Q since the procedure is not guaranteed to produce a generator. Strictly speaking, it is also necessary to check that the diagonal elements of Q are less than one, so that P = I Q is stochastic. However, with the small and we use here, that is not an issue.

6 Sluggish Markov Chains Figure 3.: log 0 kp k e T k for = 0:99, = 0:0 In a typical run with = 0:99 and = 0:0, the procedure yielded the following matrix: P = 0 B @ 0:99425020008452 0:000082554309 0:00474797437240 C 0:048687646 0:98848878048060 0:00002434835524A : (3.) 0:000400746042 0:0023388059609 0:997260943488 Figure 3. plots the common logarithm of the residual norm kp k e T k. The lower line is the plot of the residual norm for = 0, which eectively represents a pure decay. The sluggish chain quickly settles into its local convergence mode, and the two lines diverge. Figure 3.2 plots the local 0-fold reduction time. After an initial increase it begins to converge slowly to its asymptotic value of about 30.

Sluggish Markov Chains 7 4. Discussion Figure 3.2: Local 0-fold reduction time for = 0:99, = 0:0 The above example shows that sluggish Markov chains exist and that their behavior diers markedly from the behavior predicted by the subdominant eigenvalue. Although we have presented only one example, it is typical of the other examples generated by our algorithm. The chief dierence from example to example is in the behavior during the initial iterations, where the hump in the local 0-fold reduction time varies in size and can even be absent. In any event, the interested reader can try out the matlab code in the appendix. It might be objected that the structure of the matrix (3.) is rather special: it is a perturbation of the identity matrix. However, such are the matrices that approximate the slow transient of a nearly completely decomposable Markov chain

8 Sluggish Markov Chains []. Otherwise, the matrix itself gives no hint of its sluggish transient. The value = 0:0 used in the example is decidedly subcritical. In fact, for = 0:02 (still subcritical) our algorithm was unable to generate a legitimate transition matrix, even after ve thousand repetitions. This suggests that Markov of chains of critical or supercritical sluggishness may not exist. The conjecture is worth further study. References [] P.-J. Courtois. Decomposability. Academic Press, New York, 977. Appendix: Matlab Code e = [;;]; d = [0, 0, 0; 0, -rho, -rho*sigma; 0, 0, -rho]; i = 0; while(==) i=i+ [u,r] = qr([e,randn(3,2)]); pi = rand(,3); pi = pi/(pi*e); w = [pi; u(:,2:3)']; wi = inv(w); q=wi*d*w; if (q(,)>0 & q(2,2)>0 & q(3,3)>0 & q(,2)<0 & q(,3)<0... & q(2,)<0 & q(2,3)<0 & q(3,)<0 & q(3,2)<0), break; end end p = eye(3)-q clear x; clear y; for i=:500 x(i) = log0(norm(p^i-e*pi)); end for i=:499 y(i) = /(x(i)-x(i+)); end plot(x)