Convergence of random processes

Similar documents
3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

6. Stochastic processes (2)

6. Stochastic processes (2)

DS-GA 1002 Lecture notes 5 Fall Random processes

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 12: Discrete Laplacian

Continuous Time Markov Chains

Composite Hypotheses testing

NUMERICAL DIFFERENTIATION

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Probability and Random Variable Primer

Lecture 4 Hypothesis Testing

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Randomness and Computation

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Random Walks on Digraphs

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Expected Value and Variance

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Markov chains. Definition of a CTMC: [2, page 381] is a continuous time, discrete value random process such that for an infinitesimal

APPENDIX A Some Linear Algebra

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

More metrics on cartesian products

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Applied Stochastic Processes

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Google PageRank with Stochastic Matrix

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Estimation: Part 2. Chapter GREG estimation

Numerical Heat and Mass Transfer

Entropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders)

A be a probability space. A random vector

Feature Selection: Part 1

Lecture Notes on Linear Regression

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Simulation and Random Number Generation

2.3 Nilpotent endomorphisms

Report on Image warping

18.1 Introduction and Recap

Lecture 3: Probability Distributions

Linear Approximation with Regularization and Moving Least Squares

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Problem Set 9 - Solutions Due: April 27, 2005

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Linear Regression Analysis: Terminology and Notation

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Notes on Frequency Estimation in Data Streams

Chapter 13: Multiple Regression

Computing MLE Bias Empirically

Lecture 3. Ax x i a i. i i

Lecture 4: Universal Hash Functions/Streaming Cont d

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Probabilistic Graphical Models

Queueing Networks II Network Performance

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Chapter 1. Probability

Continuous Time Markov Chain

Difference Equations

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

= z 20 z n. (k 20) + 4 z k = 4

Inductance Calculation for Conductors of Arbitrary Shape

Comparison of Regression Lines

Chapter 11: Simple Linear Regression and Correlation

Singular Value Decomposition: Theory and Applications

Chapter 7 Channel Capacity and Coding

The Geometry of Logit and Probit

Error Probability for M Signals

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

8.592J: Solutions for Assignment 7 Spring 2005

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Lecture 3: Shannon s Theorem

Module 9. Lecture 6. Duality in Assignment Problems

Economics 130. Lecture 4 Simple Linear Regression Continued

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

CS-433: Simulation and Modeling Modeling and Probability Review

Appendix B. The Finite Difference Scheme

FINITE-STATE MARKOV CHAINS

x i1 =1 for all i (the constant ).

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Min Cut, Fast Cut, Polynomial Identities

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U-Pb Geochronology Practical: Background

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Global Sensitivity. Tuesday 20 th February, 2018

The Order Relation and Trace Inequalities for. Hermitian Operators

Gaussian Mixture Models

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Uncertainty as the Overlap of Alternate Conditional Distributions

Vapnik-Chervonenkis theory

Transcription:

DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large numbers, the central lmt theorem and the convergence of Markov chans, whch are fundamental n statstcal estmaton and probablstc modelng. 2 Types of convergence Convergence for a determnstc sequence of real numbers x 1, x 2,... s smple to defne: lm x = x (1) f x s arbtrarly close to x as grows. More formally, for any ɛ > there s an such that for all > x x < ɛ. Ths allows to defne convergence for a realzaton of a dscrete random process X (ω, ),.e. when we fx the outcome ω and X (ω, ) s just a determnstc functon of. It s more challengng to defne convergence of the random process to a random varable X, snce both of these objects are only defned through ther dstrbutons. In ths secton we descrbe several alternatve defntons of convergence for random processes. 2.1 Convergence wth probablty one Consder a dscrete random process X and a random varable X defned on the same probablty space. If we fx an element ω of the sample space Ω, then X (, ω) s a determnstc sequence and X (ω) s a constant. It s consequently possble to verfy whether X (, ω) converges determnstcally to X (ω) as for that partcular value of ω. In fact, we can ask: what s the probablty that ths happens? To be precse, ths would be the probablty that f we draw ω we have lm X (, ω) = X (ω). (2) If ths probablty equals one then we say that X () converges to X wth probablty one. Defnton 2.1 (Convergence wth probablty one). A dscrete random vector X converges wth probablty one to a random varable X belongng to the same probablty space (Ω, F, P)

1.8 D (ω, ).6.4.2 5 1 15 2 25 3 35 4 45 5 Fgure 1: Convergence to zero of the dscrete random process D defned n Example 2.2 of Lecture Notes 5. f ({ P ω ω Ω, }) lm X (ω, ) = X (ω) = 1. (3) Recall that n general the sample space Ω s very dffcult to defne and manpulate explctly, except for very smple cases. Example 2.2 (Puddle (contnued)). Let us consder the dscrete random process D defned n Example 2.2 of Lecture Notes 5. If we fx ω (, 1) lm D (ω, ) = lm (4) =. (5) ω It turns out the realzatons tend to zero for all possble values of ω n the sample space. Ths mples that D converges to zero wth probablty one. 2.2 Convergence n mean square and n probablty To verfy convergence wth probablty one we fx the outcome ω and check whether the correspondng realzatons of the random process converge determnstcally. An alternatve vewpont s to fx the ndexng varable and consder how close the random varable X () s to another random varable X as we ncrease. 2

A possble measure of the dstance between two random varables s the mean square of ther dfference. Recall that f E ( (X Y ) 2) = then X = Y wth probablty one by Chebyshev s nequalty. The mean square devaton between X () and X s a determnstc quantty (a number), so we can evaluate ts convergence as. If t converges to zero then we say that the random sequence converges n mean square. Defnton 2.3 (Convergence n mean square). A dscrete random process X converges n mean square to a random varable X belongng to the same probablty space f ( ( ) lm E X X 2 ()) =. (6) Alternatvely, we can consder the probablty that X () s separated from X by a certan fxed ɛ >. If for any ɛ, no matter how small, ths probablty converges to zero as then we say that the random sequence converges n probablty. Defnton 2.4 (Convergence n probablty). A dscrete random process X converges n probablty to another random varable X belongng to the same probablty space f for any ɛ > ( ) X lm P X () > ɛ =. (7) Note that as n the case of convergence n mean square, the lmt n ths defnton s determnstc, as t s a lmt of probabltes, whch are just real numbers. As a drect consequence of Markov s nequalty, convergence n mean square mples convergence n probablty. Theorem 2.5. Convergence n mean square mples convergence n probablty. Proof. We have ( ) ( ( X lm P X () > ɛ = lm P X X ) 2 () > ɛ 2) ( ( ) X X 2 ()) lm ɛ 2 by Markov s nequalty (9) =, (1) E f the sequence converges n mean square. It turns out that convergence wth probablty one also mples convergence n probablty. Convergence n probablty one does not mply convergence n mean square or vce versa. The dfference between these three types of convergence s not very mportant for the purposes of ths course. (8) 3

2.3 Convergence n dstrbuton In some cases, a random process X does not converge to the value of any random varable, but the pdf or pmf of X () converges pontwse to the pdf or pmf of another random varable X. In that case, the actual values of X () and X wll not necessarly be close, but they have the same dstrbuton. We say that X converges n dstrbuton to X. Defnton 2.6 (Convergence n dstrbuton). A dscrete-state dscrete random process X converges n dstrbuton to a dscrete random varable X belongng to the same probablty space f where R X s the range of X. lm p X() (x) = p X (x) for all x R X, (11) A contnuous-state dscrete random process X converges n dstrbuton to a contnuous random varable X belongng to the same probablty space f lm f X() (x) = f X (x) for all x R, (12) assumng the pdfs are well defned (otherwse we can use the cdfs 1 ). Note that convergence n dstrbuton s a much weaker noton than convergence wth probablty one, n mean square or n probablty. If a dscrete random process X converges to a random varable X n dstrbuton, ths only means that as becomes large the dstrbuton of X () tends to the dstrbuton of X, not that the values of the two random varables are close. However, convergence n probablty (and hence convergence wth probablty one or n mean square) does mply convergence n dstrbuton. Example 2.7 (Bnomal converges to Posson). Let us defne a dscrete random process X () such that the dstrbuton of X () s bnomal wth parameters and p := λ/. X () and X (j) are ndependent for j, whch completely characterzes the n-order dstrbutons of the process for all n > 1. Consder a Posson random varable X wth parameter λ that s ndependent of X () for all. Do you expect the values of X and X () to be close as? No! In fact even X () and X ( + 1) wll not be close n general. However, X converges n 1 One can also defne convergence n dstrbuton of a dscrete-state random process to a contnuous random varable through the determnstc convergence of the cdfs. 4

dstrbuton to X, as establshed n Example 3.7 of Lecture Notes 2: ( ) lm p X() (x) = lm p x (1 p) ( x) (13) x = λx e λ x! (14) = p X (x). (15) 3 Law of Large Numbers Let us defne the average of a dscrete random process. Defnton 3.1 (Movng average). The movng or runnng average à of a dscrete random process X, defned for = 1, 2,... (.e. 1 s the startng pont), s equal to à () := 1 j=1 X (j). (16) Consder an d sequence. A very natural nterpretaton for the movng average s that t s a real-tme estmate of the mean. In fact, n statstcal terms the movng average s the emprcal mean of the process up to tme (we wll dscuss the emprcal mean later on n the course). The notorous law of large numbers establshes that the average does ndeed converge to the mean of the d sequence. Theorem 3.2 (Weak law of large numbers). Let X be an d dscrete random process wth mean µ X := µ such that the varance of X () σ 2 s bounded. Then the average à of X converges n mean square to µ. Proof. Frst, we establsh that the mean of à () s constant and equal to µ, ( ) ) 1 E (à () = E X (j) = 1 j=1 ( ) E X (j) j=1 (17) (18) = µ. (19) 5

Due to the ndependence assumpton, the varance scales lnearly n. Recall that for ndependent random varables the varance of the sum equals the sum of the varances, ( ) ) 1 Var (à () = Var X (j) (2) We conclude that ( (à ) ) 2 lm E () µ j=1 = 1 ( ) Var X (j) 2 j=1 (21) = σ2. (22) ( (à )) ) 2 = lm E () E (à () ) (à () = lm Var by (19) (23) (24) σ 2 = lm by (22) (25) =. (26) By Theorem 2.5 the average also converges to the mean of the d sequence n probablty. In fact, one can also prove convergence wth probablty one under the same assumptons. Ths result s known as the strong law of large numbers, but the proof s beyond the scope of these notes. We refer the nterested reader to more advanced texts n probablty theory. Fgure 2 shows averages of realzatons of several d sequences. When the d sequence s Gaussan or geometrc we observe convergence to the mean of the dstrbuton, however when the sequence s Cauchy the movng average dverges. The reason s that, as we saw n Example 3.2 of Lecture Notes 4, the Cauchy dstrbuton does not have a well defned mean! Intutvely, extreme values have non-neglgeable probablty under the Cauchy dstrbuton so from tme to tme the d sequence takes values wth very large magntudes and ths makes the movng average dverge. 4 Central Lmt Theorem In the prevous secton we establshed that the movng average of a sequence of d random varables converges to the mean of ther dstrbuton (as long as the mean s well defned and the varance s fnte). In ths secton, we characterze the dstrbuton of the average à () 6

2. 1.5 1. Movng average Mean of d seq..5..5 1. 1.5 2. 1 2 3 4 5 Standard Gaussan (d) 2. 1.5 1. Movng average Mean of d seq..5..5 1. 1.5 2. 1 2 3 4 5 2. 1.5 1. Movng average Mean of d seq..5..5 1. 1.5 2. 1 2 3 4 5 12 1 8 6 4 2 Movng average Mean of d seq. 1 2 3 4 5 Geometrc wth p =.4 (d) 12 1 8 6 4 2 Movng average Mean of d seq. 1 2 3 4 5 12 1 8 6 4 2 Movng average Mean of d seq. 1 2 3 4 5 3 25 2 Movng average Medan of d seq. 15 1 5 5 1 2 3 4 5 1 5 5 Cauchy (d) Movng average Medan of d seq. 1 1 2 3 4 5 3 2 1 Movng average Medan of d seq. 1 2 3 4 5 6 1 2 3 4 5 Fgure 2: Realzaton of the movng average of an d standard Gaussan sequence (top), an d geometrc sequence wth parameter p =.4 (center) and an d Cauchy sequence (bottom). 7

as ncreases. It turns out that à converges to a Gaussan random varable n dstrbuton, whch s very useful n statstcs as we wll see later on. Ths result, known as the central lmt theorem, justfes the use of Gaussan dstrbutons to model data that are the result of many dfferent ndependent factors. For example, the dstrbuton of heght or weght of people n a certan populaton often has a Gaussan shape as llustrated by Fgure 1 of Lecture Notes 2 because the heght and weght of a person depends on many dfferent factors that are roughly ndependent. In many sgnal-processng applcatons nose s well modeled as havng a Gaussan dstrbuton for the same reason. Theorem 4.1 (Central Lmt Theorem). Let X be an d dscrete random process wth mean µ X := µ such that the varance of X () σ 2 s bounded. The random process ) n (à µ, whch corresponds to the centered and scaled movng average of X, converges n dstrbuton to a Gaussan random varable wth mean and varance σ 2. Proof. The proof of ths remarkable result s beyond the scope of these notes. It can be found n any advanced text on probablty theory. However, we would stll lke to provde some ntuton as to why the theorem holds. In Theorem 3.18 of Lecture Notes 3, we establsh that the pdf of the sum of two ndependent random varables s equal to the convolutons of ther ndvdual pdfs. The same holds for dscrete random varables: the pmf of the sum s equal to the convoluton of the pmfs, as long as the random varables are ndependent. If each of the entres of the d sequence has pdf f, then the pdf of the sum of the frst elements can be obtaned by convolvng f wth tself tmes f 1 j=1 X(j) (x) = (f f f) (x). (27) If the sequence has a dscrete state and each of the entres has pmf p, the pmf of the sum of the frst elements can be obtaned by convolvng p wth tself tmes p 1 j=1 X(j) (x) = (p p p) (x). (28) Normalzng by just results n scalng the result of the convoluton, so the pmf or pdf of the movng mean à s the result of repeated convolutons of a fxed functon. These convolutons have a smoothng effect, whch eventually transforms the pmf/pdf nto a Gaussan! We show ths numercally n Fgure 3 for two very dfferent dstrbutons: a unform dstrbuton and a very rregular one. Both converge to Gaussan-lke shapes after just 3 or 4 convolutons. The Central Lmt Theorem makes ths precse, establshng that the shape of the pmf or pdf becomes Gaussan asymptotcally. In statstcs the central lmt theorem s often nvoked to justfy treatng averages as f they have a Gaussan dstrbuton. The dea s that for large enough n ) n (à µ s 8

= 1 = 2 = 3 = 4 = 5 = 1 = 2 = 3 = 4 = 5 Fgure 3: Result of convolvng two dfferent dstrbutons wth themselves several tmes. shapes quckly become Gaussan-lke. The 9

9 8 7 6 5 4 3 2 1.3.35.4.45.5.55.6.65 Exponental wth λ = 2 (d) 3 25 2 15 1 5.3.35.4.45.5.55.6.65 9 8 7 6 5 4 3 2 1.3.35.4.45.5.55.6.65 = 1 2 = 1 3 = 1 4 Geometrc wth p =.4 (d) 2.5 2. 1.5 1..5 7 6 5 4 3 2 1 25 2 15 1 5 1.8 2. 2.2 2.4 2.6 2.8 3. 3.2 1.8 2. 2.2 2.4 2.6 2.8 3. 3.2 1.8 2. 2.2 2.4 2.6 2.8 3. 3.2 = 1 2 = 1 3 = 1 4 Cauchy (d).3.25.2.15.1.5.3.25.2.15.1.5.3.25.2.15.1.5 2 15 1 5 5 1 15 2 15 1 5 5 1 15 2 15 1 5 5 1 15 = 1 2 = 1 3 = 1 4 Fgure 4: Emprcal dstrbuton of the movng average of an d standard Gaussan sequence (top), an d geometrc sequence wth parameter p =.4 (center) and an d Cauchy sequence (bottom). The emprcal dstrbuton s computed from 1 4 samples n all cases. For the two frst rows the estmate provded by the central lmt theorem s plotted n red. 1

approxmately Gaussan wth mean and varance σ 2, whch mples that à s approxmately Gaussan wth mean µ and varance σ 2 /n. It s mportant to remember that we have not establshed ths rgorously. The rate of convergence wll depend on the partcular dstrbuton of the entres of the d sequence. In practce convergence s usually very fast. Fgure 4 shows the emprcal dstrbuton of the movng average of an exponental and a geometrc d sequence. In both cases the approxmaton obtaned by the central lmt theory s very accurate even for an average of 1 samples. The fgure also shows that for a Cauchy d sequence, the dstrbuton of the movng average does not become Gaussan, whch does not contradct the central lmt theorem as the dstrbuton does not have a well defned mean. To close ths secton we derve a useful approxmaton to the bnomal dstrbuton usng the central lmt theorem. Example 4.2 (Gaussan approxmaton to the bnomal dstrbuton). Let X have a bnomal dstrbuton wth parameters n and p, such that n s large. Computng the probablty that X s n a certan nterval requres summng ts pmf over all the values n that nterval. Alternatvely, we can obtan a quck approxmaton usng the fact that for large n the dstrbuton of a bnomal random varable s approxmately Gaussan. Indeed, we can wrte X as the sum of n ndependent Bernoull random varables wth parameter p, X = n B. (29) =1 The mean of B s p and ts varance s p (1 p). By the central lmt theorem 1 n X s approxmately Gaussan wth mean p and varance p (1 p) /n. Equvalently, by Lemma 6.1 n Lecture Notes 2, X s approxmately Gaussan wth mean np and varance np (1 p). Assume that a basketball player makes each shot she takes wth probablty p =.4. If we assume that each shot s ndependent, what s the probablty that she makes more than 42 shots out of 1? We can model the shots made as a bnomal X wth parameters 1 and.4. The exact answer s P (X 42) = = 1 x=42 1 x=42 p X (x) (3) ( 1 x ).4 x.6 (n x) (31) = 1.4 1 2. (32) If we apply the Gaussan approxmaton, by Lemma 6.1 n Lecture Notes 2 X beng larger than 42 s the same as a standard Gaussan U beng larger than 42 µ where µ and σ are the σ 11

mean and standard devaton of X, equal to np = 4 and np(1 p) = 15.5 respectvely. ( ) P (X 42) P np (1 p)u + np 42 (33) = P (U 1.29) (34) = 1 Φ (1.29) (35) = 9.85 1 2. (36) 5 Convergence of Markov chans In ths secton we study under what condtons a fnte-state tme-homogeneous Markov chan X converges n dstrbuton. If a Markov chan converges n dstrbuton, then ts state vector p X(), whch contans the frst order pmf of X, converges to a fxed vector p, p := lm p X(). (37) Ths mples that the probablty of the Markov chan beng n each state tends to a specfc value. By Lemma 4.1 n Lecture Notes 5, we can express (37) n terms of the ntal state vector and the transton matrx of the Markov chan p = lm T ĩ p X X(). (38) Computng ths lmt analytcally for a partcular T X and p X() may seem challengng at frst sght. However, t s often possble to leverage the egendecomposton of the transton matrx (f t exsts) to fnd p. Ths s llustrated n the followng example. Example 5.1 (Moble phones). A company that makes moble phones wants to model the sales of a new model they have just released. At the moment 9% of the phones are n stock, 1% have been sold locally and none have been exported. Based on past data, the company determnes that each day a phone s sold wth probablty.2 and exported wth probablty.1. The ntal state vector and the transton matrx of the Markov chan are.9.7 a :=.1, T X =.2 1. (39).1 1 12

1 Exported Sold 1.7.1 In stock.2 Exported Exported Exported Sold Sold Sold In stock In stock In stock 5 1 15 2 Day 5 1 15 2 Day 5 1 15 2 Day Fgure 5: State dagram of the Markov chan descrbed n Example (5.1) (top). Below we show three realzatons of the Markov chan. 13

a b c 1..8.6.4.2. 5 1 15 2 Day 1. In stock.8 Sold Exported.6.4.2. 5 1 15 2 Day 1..8.6.4.2. 5 1 15 2 Day Fgure 6: Evoluton of the state vector of the Markov chan n Example (5.1) for dfferent values of the ntal state vector p X(). We have used a to denote p X() because later we wll consder other possble ntal state vectors. Fgure 6 shows the state dagram and some realzatons of the Markov chan. The company s nterested n the fate of the new model. In partcular, t would lke to compute what fracton of moble phones wll end up exported and what fracton wll be sold locally. Ths s equvalent to computng lm p X() = lm T ĩ p X X() (4) = lm T ĩ X a. (41) The transton matrx T X has three egenvectors q 1 :=, q 2 := 1,.8 q 3 :=.53. (42) 1.27 The correspondng egenvalues are λ 1 := 1, λ 2 := 1 and λ 3 :=.7. We gather the egenvectors and egenvalues nto two matrces Q := [ ] λ 1 q 1 q 2 q 3, Λ := λ 2, (43) λ 3 so that the egendecomposton of T X s T X := QΛQ 1. (44) 14

It wll be useful to express the ntal state vector a n terms of the dfferent egenvectors. Ths s acheved by computng.3 Q 1 p X() =.7, (45) 1.122 so that We conclude that lm T ĩ X a =.3 q 1 +.7 q 2 + 1.122 q 3. (46) a = lm T ĩ (.3 q X 1 +.7 q 2 + 1.122 q 3 ) (47) = lm.3 T ĩ X 1 +.7 T ĩ X 2 + 1.122 T ĩ X 3 (48) = lm.3 λ1 q 1 +.7 λ2 q 2 + 1.122 λ3 q 3 (49) = lm.3 q 1 +.7 q 2 + 1.122.5 q 3 (5) =.3 q 1 +.7 q 2 (51) =.7. (52).3 Ths means that eventually the probablty that each phone has been sold locally s.7 and the probablty that t has been exported s.3. The left graph n Fgure 6 shows the evoluton of the state vector. As predcted, t eventually converges to the vector n equaton (52). In general, because of the specal structure of the two egenvectors wth egenvalues equal to one n ths example, we have ) lm T ĩ p X X() = (Q 1 p X() ) 2 (Q. (53) 1 p X() Ths s llustrated n Fgure 6 where you can see the evoluton of the state vector f t s ntalzed to these other two dstrbutons:.6.6 b :=, Q 1 b =.4, (54).4.75 1.4.23 c :=.5, Q 1 c =.77. (55).1.5 15

The transton matrx of the Markov chan n Example 5.1 has two egenvectors wth egenvalue equal to one. If we set the ntal state vector to equal ether of these egenvectors (note that we must make sure to normalze them so that the state vector contans a vald pmf) then so that for all. In partcular, T X p X() = p X(), (56) p X() = T ĩ X X() (57) = p X() (58) lm p X() = p X(), (59) so X converges to a random varable wth pmf p X() n dstrbuton. A dstrbuton that satsfes (59) s called a statonary dstrbuton of the Markov chan. Defnton 5.2 (Statonary dstrbuton). Let X be a fnte-state tme-homogeneous Markov chan and let p stat be a state vector contanng a vald pmf over the possble states of X. If p stat s an egenvector assocated to an egenvalue equal to one, so that T X p stat = p stat, (6) then the dstrbuton correspondng to p stat s a statonary or steady-state dstrbuton of X. Establshng whether a dstrbuton s statonary by checkng whether (6) holds may be challengng computatonally f the state space s very large. We now derve an alternatve condton that mples statonarty. Let us frst defne reversblty of Markov chans. Defnton 5.3 (Reversblty). Let X be a fnte-state tme-homogeneous Markov chan wth s states and transton matrx T X. Assume that X () s dstrbuted accordng to the state vector p R s. If ( P X () = xj, X ) ( ( + 1) = x k = P X () = xk, X ) ( + 1) = x j, for all 1 j, k s, then we say that X s reversble wth respect to p. Ths s equvalent to the detaled-balance condton ( T X) p kj j = ( ) T X p jk k, for all 1 j, k s. (62) (61) 16

As proved n the followng theorem, reversblty mples statonarty, but the converse does not hold. A Markov chan s not necessarly reversble wth respect to a statonary dstrbuton (and often won t be). The detaled-balance condton therefore only provdes a suffcent condton for statonarty. Theorem 5.4 (Reversblty mples statonarty). If a tme-homogeneous Markov chan X s reversble wth respect to a dstrbuton p X, then p X s a statonary dstrbuton of X. Proof. Let p be the state vector contanng p X. By assumpton T X and p satsfy (62), so for 1 j s ( ) s T X p = ( j T X) = k=1 s ( T X) k=1 s ( = p j T X) k=1 jk p k (63) kj p j (64) kj (65) = p j. (66) The last step follows from the fact that the columns of a vald transton matrx must add to one (the chan always has to go somewhere). In Example 5.1 the Markov chan has two statonary dstrbutons. It turns out that ths s not possble for rreducble Markov chans. Theorem 5.5. Irreducble Markov chans have a sngle statonary dstrbuton. Proof. Ths follows from the Perron-Frobenus theorem, whch states that the transton matrx of an rreducble Markov chan has a sngle egenvector wth egenvalue equal to one and nonnegatve entres. If n addton, the Markov chan s aperodc, then t s guaranteed to converge n dstrbuton to a random varable wth ts statonary dstrbuton for any ntal state vector. Such Markov chans are called ergodc. Theorem 5.6 (Convergence of Markov chans). If a dscrete-tme tme-homogeneous Markov chan X s rreducble and aperodc ts state vector converges to the statonary dstrbuton p stat of X for any ntal state vector p X(). Ths mples that X converges n dstrbuton to a random varable wth pmf gven by p stat. 17

1/3 1.1 p X() = 1/3 p X() = p X() =.2 1/3.7 1. SF.8 LA SJ.6.4.2. 5 1 15 2 Customer 1. SF.8 LA SJ.6.4.2. 5 1 15 2 Customer 1. SF.8 LA SJ.6.4.2. 5 1 15 2 Customer Fgure 7: Evoluton of the state vector of the Markov chan n Example (5.7). The proof of ths result s beyond the scope of the course. Example 5.7 (Car rental (contnued)). The Markov chan n the car rental example s rreducble and aperodc. We wll now check that t ndeed converges n dstrbuton. Its transton matrx has the followng egenvectors.273 q 1 :=.545,.577 q 2 :=.789,.577 q 3 :=.211. (67).182.211.789 The correspondng egenvalues are λ 1 := 1, λ 2 :=.573 and λ 3 :=.227. As predcted by Theorem 5.5 the Markov chan has a sngle statonary dstrbuton. For any ntal state vector, the component that s collnear wth q 1 wll be preserved by the transtons of the Markov chan, but the other two components wll become neglgble after a whle. The chan consequently converges n dstrbuton to a random varable wth pmf q 1 (note that q 1 has been normalzed to be a vald pmf), as predcted by Theorem 5.6. Ths s llustrated n Fgure 7. No matter how the company allocates the new cars, eventually 27.3% wll end up n San Francsco, 54.5% n LA and 18.2% n San Jose. 18