# An Introduction to Entropy and Subshifts of. Finite Type

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University August 4, 2015

2 Abstract This work gives an overview of subshifts of finite type, and three measures of entropy on these systems. We develop Markov chain theory in order to define a subshift of finite type. We briefly define both topological and measuretheoretic dynamical systems in more generality. The three types of entropy defined here are topological, Shannon s, and Kolmogorov-Sinai s entropy.

3 Contents 1 Introduction 2 2 Markov Chains 4 3 Symbolic Dynamics 14 4 General Dynamical System 22 5 Shannon and Metric Entropy 25 6 Example 30 1

4 Chapter 1 Introduction Subshifts of finite type, or topological Markov shifts, are a subset of a symbolic dynamical system that are largely studied in Ergodic theory. A symbolic dynamical system is a sequence space on a finite set of symbols together with a shift map on this space. For example, if we consider the set of infinite sequences of 0 s and 1 s, X = {0, 1} N, then the shift map, σ : X X, shifts each element of the sequence one index to the left, σ(x) i = x i+1. Subshifts of finite type are a closed, shift-invariant subset, which are defined by an incidence matrix. An incidence matrix, A is a (0, 1) matrix that indicates which symbols are allowed to follow another. For i, j in a finite symbol set, j is allowed to follow i if (a) ij = 1. There is natural a correspondence between subshifts of finite type and Markov chains. A Markov chain is a sequence of random variables defined on a common probability space taking values in a finite state space. Markov chains possess a memory-less property; the past states of the process have 2

5 no influence on the future states. Let (Ω, F, P) be a probability space, and let I be a countable set which denotes the state space. Let λ = (λ i : i I) denote the counting measure on I. Then, because I is countable, λ is a probability distribution if i I λ i = 1. A random variable X, is a function on the probability space with values in the state space, X : Ω I. If λ is a distribution, then we denote λ i = P(X = i) = P({ω : X(ω) = i}). A Markov chain is a sequence (X n ) n 0 with an initial distribution λ and transition matrix P such that P(X 0 = i 0 ) = λ i0 and P(X n+1 = i n+1 X 0 = i 0, X 1 = i 1,..., X n = i n ) = P(X n+1 = i n+1 X n = i n ) = p in,in+1. We will further explore the similarities of Markov chains and subshifts of finite type starting with Markov chains as a foundation. We also define a general dynamical system in order to define three types of entropy on subshifts of finite type. From a dynamical systems perspective, entropy is a measure of the randomness in the system. From the perspective of information theory, entropy measures the average amount of information that the system can produce. The three types of entropy that we explore are topological entropy, Shannon s entropy, and measure-theoretic entropy due to Kolmogorov and Sinai. We conclude this work by providing a comprehensive example; we begin with a Markov chain, which defines a subshift of finite type. We then show computations involving each of the three measures of entropy, and interpret their meaning. 3

6 Chapter 2 Markov Chains At the very foundation of a subshift of finite type is a Markov chain. A Markov process in general is characterized by a memory-less property. That is, the past states of the process have no influence on the future states. This means the conditional distribution of the next state given the current state is equivalent to the conditional distribution of the the next state given all previous states. Markov chains are then a subset of Markov processes which have only a countable set of states. The content from this section can be found in detail in [3]. As a setting to the following work, we let (Ω, F, P) be a probability space, and let I be a countable set which denotes the state space. Let λ = (λ i : i I) denote the counting measure on I. Then, because I is countable, λ is a probability distribution if i I λ i = 1. A random variable X is a function on the probability space with values in the state space, X : Ω I. If λ is a distribution, then we denote λ i = P(X = i) = P({ω : X(ω) = i}). We say a 4

7 matrix, P = (p i,j ) i,j I, is stochastic if every row (p i,j ) j I is a distribution. Let P be a stochastic matrix. The process (X n ) n 0 is a Markov chain with initial distibution if (i) X 0 has initial distribution λ i.e. P(X 0 = i 0 ) = λ i0 (ii) P(X n+1 = i n+1 X 0 = i 0, X 1 = i 1,..., X n = i n ) = P(X n+1 = i n+1 X n = i n ) = p in,i n+1. The matrix P is known as the transition matrix. Note that we only consider chains such that P(X n = i X n 1 = j) = P(X m = i X m 1 = j) for all m and n and for all states i and j. Because the Markov chain has no dependence on n, we say that it is time-homogeneous. A Markov chain is a a finite sequence of random variables with initial distribution λ and transition matrix P. So the probability P(X n+1 = j X n = i) is given by the (i, j) entry of the transition matrix, p i,j. We abbreviate this by saying that (X n ) n 0 is Markov (λ, P ). Equivalently, a system is Markov if and only if the following proposition holds. Proposition A discrete-time random process X 1, X 2,... is Markov with initial distribution λ and transition matrix P if and only if for all i 0,..., i N I, P(X 0 = i 0, X 1 = i 1,..., X N = i N ) = λ i0 p i0,i 1 p i1,i 2... p in 1,i N. Proof. The forward direction follows directly from the definition of a Markov chain. So, assume that P(X 0 = i 0, X 1 = i 1,..., X N = i N ) = λ i0 p i0,i 1 p i1,i 2... p in 1,i N holds for N, then an induction argument shows it holds for all n N. Let δ i = (d ij : j I) denote the Dirac delta function. 5

8 Proposition (Markov Property). Let (X n ) n 0 be Markov (λ, P ). Then conditional on X m = i, (X m+n ) n 0 is Markov (δ i, P ) and is independent of the random variables X 0,..., X m. Proof. The proof follows from a straight forward application of the previous theorem. We first show that this holds for cylinder sets of the form A = {X 0 = i 0,... X m = i m }. Then because any event E = k=1 A k is a countable, disjoint union of these events, the result holds in general. There is a natural correspondence between Markov chains and directed graphs. Let G P denote the graph corresponding to a Markov process (λ, P ). The graph G P has assigned vertices from the state space I and directed edges from i to j if (p ij ) > 0. We can think of the initial distribution of the process assigning probabilities or weights to the vertices, and the transition matrix assigning transition probabilities to the edges between vertices. Example Consider the process (X n ) n 0 defined on the state space I = {1, 2, 3}, and let λ be some initial distribution. The graph below is represented by transition matrix P given by: P = /2 1/2 1/2 0 1/2. Then, the entry (p ij ) i,j I represents the probability of transitioning from state i to state j in one time step. The initial distribution λ represents a weighting of the nodes on the graph. An initial distribution of λ = (1/2, 1/8, 3/8) represents the probabilities P(X 0 = 1) = 1/2, P(X 0 = 2) = 1/8, P(X 0 = 3) = 3/8. 6

9 Theorem Let (X n ) n 0 be Markov (λ, P ). Then for all n, m 0, (i) P(X n = j) = (λp n ) j (ii) P(X n = j X 0 = i) = P(X n+m = j X m = i) = p (n) i,j. Proof. By Proposition 2.0.1, (i) P(X n = j) = i 0 I = i 0 I = (λp n ) j. i n 1 I i n 1 I P(X 0 = i 0,..., X n 1 = i n 1, X n = j) λi 0 p i0 p i1... p in 1 j (ii) By the Markov property, conditional on X m = i, (X m+n ) n 0 is Markov (δ i, P ). Therefore, if we let λ = δ i above, the result follows. Intuitively, this theorem makes sense when considering a walk on a graph. This gives us the probability of landing on vertex j in n time steps and the 7

10 probability of reaching vertex j from vertex i in n steps. represents the n-step transition probability from i to j. Note here, p (n) i,j We say that i leads to j if P(X n = j X 0 = i) > 0 for some n 0. This corresponds to a directed edge from vertex i to vertex j. Then i communicates with j if both i leads to j, and j leads to i. The relation communicates forms an equivalence relation on I and partitions I into communicating classes. Given a fixed vertex, the communicating class is the set of vertices which can be reached by a walk on the graph. A class C is closed if i C and i communicates with j implies that j C. Finally, we say that a transition matrix P is irreducible if I is a single communicating class. Let P i (A) denote the probability that A occurs given X 0 = i. We define a stopping time as a random variable T : Ω [0, ] if the event that [T = n] is a function of only X 0, X 1,... X n for n <. For example, consider the process (X n ) n 0 on a discrete state space, I. The first time that the process reaches state i is a stopping time. This is known as the hitting time, τ, and is formally defined as τ = min{n : X n = i}. We see that the hitting time is a stopping time, because P(X n = i) = P(X 0 i,..., X n 1 i, X n = i). Theorem (Strong Markov Property). Let (X n ) n 0 be Markov (λ, P ); let T be a stopping time of the process (X n ) n 0. Then conditional on T < and X T = i, (X T +n ) n 0 is Markov (δ i, P ) and independent of X 0, X 1,..., X T. 8

11 Proof. Let A be an event depending only on X 0, X 1,..., X T. Then the event A {T = m} is determined by X 0, X 1,..., X m. By the Markov property at time m, P({X T = j 0, X T +1 = j 1,..., X Tn = j n } A {T = m} {X T = i}) = P i (X 0 = j 0, X 1 = j 1,..., X n = j n )P(A {T = m} {X T = i}). By definition of conditional probability, if we sum over m = 0, 1, 2,... and divide by the probability that P(T <, X T = i), we find that P({X T = j 0, X T +1 = j 1,..., X Tn = j n } A T <, X T = i) = P i (X 0 = j 0, X 1 = j 1,..., X n = j n )P(B T <, X T = i). For a state i I, the probability that X n = i infinitely often is denoted by P(X n = i i.o.). We say that state i of a Markov process is recurrent if P i (X n = i i.o.) = 1 and transient if P i (X n = i i.o.) = 0. The first passage time T i is the first time that the process returns to state i: T i (ω) = inf{n 1 : X n (ω) = 1}. The notion of passage time allows us to say that every state is either recurrent and transient. Proposition The following holds: (i) If P i (T i < ) = 1, then i is recurrent, and n=0 p(n) ii =. (ii) If P i (T i < ) < 1 then i is transient, and n=0 p(n) ii <. Proof. A proof of this depends on a concept of the number of visits to a state, which is not covered in this work. As a result, the proof can be found in [3]. Note that this allows us to observe that every state is either transient or recurrent. All states of a closed class are either recurrent or all transient. Also, 9

12 every recurrent class is closed, and every finite closed class is recurrent. These conditions allow us easily observe recurrent and transient states. The only recurrent states are those in the closed classes, and all others are transient. Example Consider a Markov chain (X n ) on the state space I = {0, 1, 2, 3}, with some initial distribution λ with non-zero entries. If the transition matrix is P 1 = 1/4 0 3/ / /3 then starting in states 0, 2, or 3, there is no way to transition to state 1 with positive probability. This is easily observable from the graph: Thus, state 1 is transient and also not contained in a closed class. If we let 10

13 the transition matrix be P 2 = 1/4 0 3/4 0 1/2 0 1/ / /3 then every state is recurrent, and I is a closed class. We can verify this with the graph below: Many of the asymptotic properties of Markov chains involve the idea of an invariant distribution. Recall that a measure λ is a row vector (λ i ) with non-negative entries. The measure λ is invariant if λ is a left eigenvector of the transition matrix P; λp = λ. We also say that λ is an equilibrium or stationary measure. The following proposition proves existence of invariant measures on a finite state space. Proposition Let I be finite. Suppose for some i I that p (n) ij n for all j I. Then π = (π j : j I) is an invariant distribution. π j as Proof. In the following computation because I, is finite we can exchange summation and limits. Using the hypothesis and definition of distribution, we 11

14 compute π j = j I j I lim n p(n) (n) ij = lim p ij = 1, n so π is a distribution. To show it is invariant, we find π j = lim p (n) ij = lim n n k I p (n) ik p kj = lim n p(n) ik p kj = π k p kj. k I k I Recall that irreducibility of a single class, I, implies that starting in a given state, with positive probability, the process can reach any other state within I. Proposition Let P be irreducible, and suppose that P has an invariant distribution π. Let λ be any distribution. Suppose (X n ) n 0 is Markov (λ, P ). Then P(X n = j) π j as n for all j and p (n) ij π j as n for all i, j. Proof. A detailed proof of this proposition can be found in [3]. If the transition matrix P is irreducible, then an equilibrium distribution exists if and only if the expected time it takes for the process to return to a particular state i is finite. If an equilibrium distribution µ exists, then it is also unique. 12

15 A proof of this can be found in [3]. Note that a Markov chain which is a single closed communicating class has an equilibrium distribution. 13

16 Chapter 3 Symbolic Dynamics We now have the tools to begin defining a symbolic dynamical system. The content from this section can be found in detail in Kitchens Symbolic Dynamics [2]. The topic of main concern is a subshift of finite type which can be constructed by modifying a Markov chain. We first construct a sequence space and describe its topology. Consider the finite set A = {0, 1, 2,..., n}. This set is known as the alphabet and is analogous to the state space for Markov chains. We endow this set with the discrete topology and consider the sets of infinite and biinfinite sequences. That is if we let N denote the set {0, 1, 2,... } and Z denote the integers, then A N is the one-sided sequence space with elements of the form (.a 0 a 1 a 2 a n...) for a i A. The two-sided sequence space is A Z with elements of the form (... a 2 a 1.a 0 a 1 a 2 a n...) for a i A. These sequence spaces are endowed with the product topology generated by the usual product metric. We will use the following metrics on these sequences spaces, which are equivalent to the prod- 14

17 uct metrics. For x, y A N, d(x, y) = 1/2 k, where k is the first occurrence that the sequences x and y differ: k = max{m : x i = y i for i < m}. Analogously, for x, y A Z, d(x, y) = 1/2 k, where k = max{m : x i = y i for i < m}. We denote the space of infinite and biinfinite sequences by X n and Σ n respectively. We are mainly concerned with the dynamics of these spaces. Thus, we define the left shift transformation. This transformation shifts the sequence elements by one index to the left. Formally, σ : X n X n and σ : Σ n Σ n by σ(x) i = x i+1. For example, if we consider a (0, 1) alternating sequence in A Z, ( ), then the shift transformation acts on this sequence in the following way: σ( ) = ( ), and σ 2 ( ) = ( ), where σ k (x) denotes the kth iteration of the map σ. These spaces together with the shift transformation (X n, σ) and (Σ, σ) are known as the one-sided shift on n+1 symbols, or the full one-sided (n+1)-shift and the two-sided shift on n + 1 symbols, or the full (n+1)-shift. We are mainly concerned with characterizing long-term behaviors of this dynamical system under iteration of the shift map. We will mostly be working with the full one-sided shift; however, there are analogous definitions for the full shift as well. A point x X n is periodic, if σ p (x) = x for some p N. The point x has period p if x is periodic and p is the minimal, nonzero power 15

18 satisfying σ p (x) = x. For example, in {0, 1, 2} N, the point x = ( ) is 3k periodic for k N, and has period 3, because σ( ) = ( ) σ 2 ( ) = ( ) σ 3 ( ) = ( ). A point is eventually periodic if after a finite number of iterates, it is periodic. That is x X n is eventually periodic if σ k (x) = σ k+p (x) for p N. The types of sequences we will be working with are subshifts of finite type or SSFT. There is a natural correspondence between a SSFT and a Markov chain. Consider a square, (0, 1), irreducible matrix. The matrix A is the incidence matrix of a graph G A if there is an edge from vertex i to j if and only if A ij = 1. Note that A cannot have any complete rows or columns of zeros. The incidence matrix A agrees with the transition matrix P of a Markov chain if (p ij ) > 0 if an only if (a) ij = 1. Recall that in order for P to be a stochastic matrix, the row sums of P must equal 1. Any matrix P which agrees with A defines the incidence matrix of a Markov chain with some initial distribution λ. Recall that a matrix A is irreducible if for a finite state space I, I is a single class, meaning the graph G A is strongly connected. Thus, the matrix A is irreducible if for every pair of indices i, j there exists a k > 0 with (A k ) ij > 0. The dynamical system with sequences defined by an incidence matrix are SSFT s. Let A be an incidence matrix with rows and columns indexed by 16

19 {0, 1,..., n}. We let the matrix A define the allowable sequences in {X n } as the set {x X n : A xi x i+1 = 1 for all i N}. These define a set of closed, shiftinvariant sequences in X n. We define a one-sided subshift of finite type defined by A, or one-sided topological Markov shift defined by A as the dynamical system consisting of these sequences, along with the restricted shift map. This new dynamical system is denoted by (X A, σ). There is also an analogous definition of Subshifts of Finite Type for the full-shift, denoted by (Σ A, σ). Example Consider the subshift of finite type (X A, σ) given by the incidence matrix A = This consists only of the sequences ( ) and ( ), because the subwords 00 and 11 are not allowed. This system is compatible with the Markov chain, (X n ), with any initial vector, say λ = (1/5, 4, 5), and transition matrix P = This is the only allowable transition matrix, because the entries (p ij ) > 0 if and only if (a ij ) = 1, and P must be a stochastic matrix. Notice that the transition matrix defines the allowable words or sequences. We can think of these as infinite walks on the graph associated with the transition matrix. Another way that a subshift of finite type is defined is by the finite list of unallowable transitions. This list then excludes every sequence that has these symbols as subwords or every sequence within the cylinder set 17

20 generated by the excluded list of words. In the previous example, the set of unallowable words is {00, 11}. Recall that a matrix A is irreducible and aperiodic if for some power k, (A k ) ij > 0 for all i, j in the state space. Theorem (Perron Frobenius Theorem). Suppose that P is a nonnegative, square matrix. If P is irreducible and aperiodic, there exists a real eigenvalue λ > 0 such that: (i) λ is a simple root of the characteristic polynomial; (ii) λ has strictly positive left and right eigenvectors; (iii) the eigenvectors of λ are unique up to constant multiple; (iv) λ > µ, where µ is any other eigenvalue; (v) if 0 B P (entry by entry) and β is an eigenvalue for B then β λ and equality can occur if and only if B = P ; (vi) lim n P n λ n = rl where l and r are the left and right eigenvectors for P normalized so that l r = 1. Proof. The heart of the proof is to show that there is a single eigenline passing through the first quadrant, and that every eigenvector approaches this line under iteration by A. The Contraction Mapping theorem is then used to show that there is a unique fixed point under iteration, which turns out to be the single eigenline in the first quadrant. This ensures that the eigenvalue associated with this line is strictly larger than the other eigenvalues. 18

21 If the matrix A is irreducible but not aperiodic, then there is another version of Perron- Frobenius theorem with slight changes. In part (iv), λ > µ becomes a non-strict inequality. Also, part (vi) of the theorem is dropped. The eigenvalue and eigenvector satisfying the above conditions are known as the Perron value and Perron vector. Perron-Frobenius theorem allows us to characterize long-term behavior of the dynamical system just based on the transition matrix. An allowable block or word of length k is the finite sequence of length k that may appear in a sequence of the dynamical system. That is, [i 0,..., i k 1 ] is an allowable word if A irir+1 > 0 for r = 0,..., k 2. Notice that we can find these allowable words by iterating the transition matrix; (A k ) ij gives the number of allowed words of length k+1 starting at i and ending at j. We denote the set of allowable words of length k by W (A, k). For a given sequence x X n, W x (A, k) counts the number of subwords of length k in x and is known as the complexity function. We obtain the total set of allowed words by taking the union over all k of sets W (A, k), and denote this by W (A). We now have the tools to be able to define the first notion of entropy. For subshifts of finite type, topological entropy is a measure of the exponential growth rate of the cardinality of the set W (A), or the number of allowable words. Let A be a transition matrix and W (A) denote the cardinality of W (A). Then the topological entropy of both the one-sided subshift of finite type defined by A and the two-sided subshift of finite type defined by A is defined as lim k log W (A, k). k 19

22 Traditionally, the base of the logarithm is 2. Topological entropy is denoted by h top (X A, σ) and h top (Σ A, σ) for the shift acting on X A and Σ A respectively. Topological entropy can be thought of as a measure of randomness in the system; the larger the exponential growth rate of allowable words, the more randomness the dynamical system possesses. Alternatively, entropy may be thought of as the amount of information that is stored in the set of allowed sequences. We will see more of this notion when defining Shannon entropy and metric entropy. Proposition (3.0.6). Let A be an irreducible transition matrix. Then h top (X A, σ) = h top (Σ A, σ) = log λ, where λ is the spectral radius of A. If A is irreducible, then λ is the Perron value of A. Proof. Note that (A k ) ij represents the number of allowed words of length k +1 beginning with i and ending with j. Thus, the cardinality W (A, k) is given by ij (Ak 1 ) ij. By Perron-Frobenius theorem, there exist constants c and d such that cλ k W (A, k) dλ k for sufficiently large k. When we take the logarithm, dividing by k and taking the limit as k approaches infinity, we see that 1 log λ lim log W (A, k) log λ. k k Note that both log c and log d disappear in the limit. Therefore, by the definition of topological entropy, h top (X A ) = h top (σ A ) = log λ. 20

23 Note that the result also holds for reducible matrices. 21

24 Chapter 4 General Dynamical System We now define a measure theoretic dynamical system in order to define the last two notions of entropy. The content in this section can be found in detail in Fogg s Substitutions in Dynamics, Arithmetics and Combinatorics [1]. We will still be working with symbolic dynamical systems, just in higher generality. We must define a topological dynamical system in order to define a measure-theoretic dynamical system. A topological dynamical system (X, T ) is a compact metric space X together with an onto, continuous map T : X X. Notice that for a given transition matrix the spaces of subshifts of finite type with the shift map, (X A, σ) and (Σ A, σ), satisfy the definition of a topological dynamical system. The spaces X A and Σ A are both compact under the metric, d(x, y) = 1/2 k, where k is the first place that x and y differ, and the associated map σ is onto and continuous. Let f be a Borel measurable function defined on X. Then f is T-invariant if for all x X, f(t x) = f(x). We want to endow the space (X, T ) with a 22

25 measure. Note that a topological dynamical system, (X, T ), always has an invariant probability measure. This follows from Fogg s proof of the existence of an invariant measure on a compact metrizable space [1]. A measure-theoretic dynamical system is a system (X, B, µ, T ), where (X, T ) is a topological dynamical system, µ is a probability measure defined on the Borel measurable subsets of X, denoted by B, and T : X X is a measurable map which preserves measure; that is, for x X, µ(t x) = µ(x). Notice that the system of subshifts of finite type defined by an incidence matrix A is a measure-theoretic dynamical system, when we associate A with a transition matrix, P of a Markov chain (λ, P ). Example Consider the full shift on {0, 1} N. Here, the incidence matrix, A = , defines the topological dynamical system, (X A, σ), where σ is the shift map, σ : X A X A by σ(x) i = x i+1. The Borel sigma-algebra is generated by the cylinder sets discussed in section 3, under a measure defined by a Markov chain. Consider the Markov chain (X n ) with initial distribution λ = (3/5, 2/5) and transition matrix P = 1/2 1/2 3/4 1/4. Notice that λp = λ, so λ defines an equilibrium distribution on (X n ). This Markov process defines a measure, µ, on (X A, σ) by µ([.x 0 x 1 x 2... x n ]) = λ x0 p x0 x 1 p x1 x 2... p xn 1 x n for x i {0, 1} for all i. For example, if we consider 23

26 the cylinder set [001011], the measure of this set is µ([ ]) = 3/5 1/2 1/2 3/4 1/2 3/4 = 27/640. Therefore, the system of the one-sided full shift with measure µ and shift map σ defines a measure-theoretic dynamical system, (X A, B, µ, σ). 24

27 Chapter 5 Shannon and Metric Entropy We now have the tools to define Shannon and metric entropy. The content in this section can be found in detail in [5], [4], or [6]. Both Shannon entropy and metric entropy quantify the uncertainty and information content of a symbolic dynamical system. The following section illustrates this idea. Let α = {A 1, A 2,..., A N } be a measurable partition of the probability space (X, B, µ), and let T : X X be a measurable map over X. If we let α(x) be the element of α containing x X, then we define the itinerary of x as the sequence (α(x), α(t x), α(t 2 x),...) which takes values in A 1, A 2,..., A N. Suppose x is not known, but we know its itinerary through α(t n 1 ). We wish to quantify how much uncertainty we have about the value α(t n x) in order to define both types of entropy. Example Let S C denote the unit circle in the complex plane. That is, S = {z C : z = 1}. We will consider the transformation T : S S defined by T (z) = z 3. This map triples the angle on the unit circle. Consider 25

28 the partition α = {A 0, A 1 }, where A 0 = {e iθ : 0 θ < π} and A 1 = {e iθ : π θ < 2π}. If z S is unknown, then the itinerary of (0, 0, 0, 1, 1... ) means that z A 0, T (z) A 0, T 2 (z) A 0, T 3 (z) A 1, T 4 (z) A 1. If we know the first digits of the itinerary of z, what can we say about the nth digit of the itinerary? Knowing the first five digits of the itinerary tells us no new information about the sixth, because we do not know the initial angle, θ. This means that there is an equal probability of the sixth element being either a 0 or a 1; thus the information gained when learning α(t 5 (z)) is a constant 1/2. Let (X, B, µ) be a probability space. Suppose that x X is unknown. How can we quantify the information content of the event x is in A? Intuitively, we would like the information content of an event E to quantify the amount of uncertainty lost when we learn that x A. If the probability of event A occurring is large, then the uncertainty lost when learning that x belongs to A is small. In other words, I(A) is large when µ(a) is small. Let I(A) denote information content. The information content function is defined by the following three axioms. (i) I(A) is a continuous function of µ(a), (ii) I(A) is a non-negative, decreasing function in µ(a), and if µ(a) = 1, then I(A) = 0, (iii) If A,B are independent, meaning µ(a B) = µ(a)µ(b), then I(A B) = I(A) + I(B). 26

29 The only functions φ : [0, 1] R + such that I(A) = φ[µ(a)] that satisfy these three conditions for all probability spaces (X, B, µ) are c ln t with c < 0. We now are able to define the information content and Shannon s notion of entropy. Let (X, B, µ) be a probability space. The Information Content of a set A B is I µ (A) = log µ(a). The Information Function of a finite measurable partition, α, is I µ (α)(x) = A α I µ (A)1 A (x) = A α log µ(a). The entropy of a finite measurable partition, α is the average value of the information content of the elements of the partition. Shannon s entropy is defined as, H µ (α) = I µ (α) dµ = µ(a) log µ(a). A α X Traditionally, log base 2 is used for defining information content, and 0 log 0 = 0. Shannon s entropy has several useful properties that are analogous to many properties of a probability measure. For each of the three definitions above, we can define a conditional version based upon conditional probabilities. Let (X, B, µ) be a probability space. Let α be a finite measurable partition. Suppose that F is a sub-σ field of B. Let µ(a F)(x) denote the conditional expectation of the indicator of A given F. The information content of A given F is I µ (A F)(x) := log µ(a F)(x). The information function of α given F is I µ (α F) := A α I µ(a)(a F)1 A. 27

30 The conditional entropy of α given F is H µ (α F) := I X µ(α F) dµ. Let α, β be two countable partitions of a common probability space X. If every element of α is equal up to a µ set of measure 0 to a union of elements of β, then we say that α is courser than β, and β is finer than α. We define α β = {A B : A α and B β} as the smallest partition that is finer than both α and β. Note that if F 1 and F 2 are both σ-algebras, then F 1 F 2 is the smallest σ-algebra which contains both F 1 and F 2. Let (X, B, µ, T ) be a measure-theoretic dynamical system. The following definition is due to Kolmogorov and Sinai. Metric entropy is defined as h µ = sup{h µ (T, α) : α is a countable measurable partition such that H µ (α) < } ( n 1 ) where h µ (T, α) = lim H µ T i α. n i=0 Note that the limit that defines h µ (T, α) exists. A proof of this relies on the subadditivity of Shannon entropy, and can be found in [6]. Shannon entropy, H µ (α n ) gives the average information content in the first n-digits of the α-itinerary. When we divide by n, this gives the the average information per unit time. Therefore, we can think of metric entropy as measuring the maximal rate of information production the system is capable of generating. Theorem (5.0.4). Let µ be a translation invariant measure for a Markov chain with transition matrix P = (p ij ) and initial distribution (p i ). Then h µ (σ) = i,j p ip ij log p ij. Proof. Let α be the partition generated by the cylinder sets {[a] : a I} and 28

31 let α0 n 1 denote the refinement σ( n 1 i=0 T i α). Then a computation shows that ( ) H µ (α0 n 1 ) = n p i p ij log p ij p i log p i. Now dividing by n and taking i,j i the limit as n approaches infinity, we find, ( 1 lim n n H µ(α0 n 1 ) = lim p i p ij log p ij 1 p i log p i ). n n i,j Therefore, h µ (σ) = i,j p ip ij log p ij. Then because α generates the Borel sigma-algebra, by Sinai s Generator Theorem in [5], h µ (σ) = h µ (σ, α). i 29

32 Chapter 6 Example Consider the Markov chain (X n ) with initial distribution λ = (1/2, 1/4, 1/4) and permutation matrix P with associated graph representation below, P = 1/4 3/ , Notice that every state in I = {0, 1, 2} is recurrent, because the graph of the Markov chain (λ, P ) is a closed class, meaning we can transition from a given state to any other state with positive probability. 30

33 The corresponding incidence matrix for a subshift of finite type is given by A = , and dynamical system (X A, σ) under the shift operator. We see that A is irreducible, with period 1, and is therefore aperiodic. This incidence matrix defines a shift with the set of unallowable subwords: {02, 10, 11, 21, 22}. We see that ( ) is the only fixed point, because there is only a loop on state 0. There are no period 2 points, because the shortest complete walk on the graph, G A, other that around the 0-loop has length 3. We can create periodic points of any period greater than or equal to 3 by the following. Note that ( ) has a period of 3. Because 00 is an allowable subword, we can create a word of period k 3 by (. 00 }{{... 0} 12 } 00 {{... 0} 12...). k-2 k 2 31

34 The Markov chain (λ, P ) defines a measure µ on the shift space (X A, σ). Consider the cylinder sets [000], [012], [1200]. Then µ([.000]) = 1/2 1/4 1/4, µ([.012]) = 1/2 3/4 1, µ([.1200]) = 1/ /4, because µ([.a 0 a 1 a 2... a n ]) = λ a0 p a0 a 1 p a1 a 2... p an 1 a n. By Proposition [3.0.6], to compute topological entropy, we need to find the Perron root of the incidence matrix, A. We compute the characteristic polynomial of A by finding the determinant, det(a xi), where I is the identity matrix. Thus, A has characteristic polynomial c T (x) = x 3 + x We see that the roots of c T are approximately i, i, and , all of which are eigenvalues. By the Perron-Frobenius theorem, because A is a nonnegative, square, irreducible and aperiodic matrix, the Perron root is We will denote this by ψ = The Perron vector is the associated eigenvector: v = ( , , 1). We can now compute the topological entropy of (X A, σ). This is defined 32

35 as the limit of the number of allowable subwords per unit time: h top (X A, σ) = lim k 1 = log ψ W (A, k) k In order to compute Shannon s entropy, we must consider a partition of X A. Consider the partition of length 2 cylinder sets, α 1 = {[00], [01], [12], [20]}. Let these partition elements be {A 1, A 2, A 3, A 4 } respectively. Then by the measure µ defined by (λ, P ), we find µ(a 1 ) = 1/8 µ(a 2 ) = 3/8 µ(a 3 ) = 1/4 µ(a 4 ) = 1/4. Recall that for a finite partition α, Shannon s entropy is defined as H(α) = A α µ(a) log µ(a). So we compute Shannon s entropy of the partition α 1 : H(α 1 ) = 1/8 log 1/8 3/8 log 3/8 1/4 log 1/4 1/4 log 1/

36 using log base 2. Now consider the partition of length 3 cylinder sets, α 2 = {[000], [001], [012], [120], [200], [201]}. We find the measure of each partition element using the measure defined by the Markov chain (λ, P ), µ(a 1 ) = 1/32 µ(a 2 ) = 3/32 µ(a 3 ) = 3/8 µ(a 4 ) = 1/4 µ(a 5 ) = 1/16 µ(a 6 ) = 3/16. Again, if we find Shannon s entropy of this new partition, H(α 2 ) = A α 2 µ(a) log µ(a) = 1/32 log 1/32 3/32 log 3/32 3/8 log 3/8 1/4 log 1/4 1/16 log 1/16 3/16 log 3/ Because Shannon s entropy is a measure of the average value of the information content of the elements of a partition, it makes sense that α 2 has higher entropy than α 1 ; the partition elements of length three provide more information on average than the partition elements of length two. We would 34

37 be slightly better at guessing a word from the system knowing the first three symbols versus the first two. However, because we can consider partitions of any length, Shannon s entropy is unbounded. Thus an entropy value of approximately two is a small value. Last, we find the metric entropy of the system (X A, σ). We must find an equilibrium distribution on the system defined by (λ, P ). We need to find a measure π such that πp = π. Using P = 1/4 3/ , by solving the system of linear equations, we find that π = (2/5, 3/10, 3/10). Note that the norm of π is one, because π defines a probability measure. By Theorem [5.0.4] we find, 35

38 h µ (σ) = i,j λ i p ij log p ij = (2/5 1/4 log 1/4 + 2/5 3/4 log 3/4 + 1/4 0 log 0 + 3/10 0 log 0 + 3/10 0 log 0 + 3/10 1 log 1 + 3/10 1 log 1 + 3/10 0 log 0 + 3/10 0 log 0) = 1/5 + 3/10 log 4/ using log base 2. Note that by convention, 0 log 0 = 0. This value of entropy represents the maximal rate of information production the system is capable of generating. Low entropy is not a surprising result. The system is highly predictable, because the only variability within a word is placement of 12 between a string of 0 s of a variable length. This is because according to the incidence matrix A, a 2 must follow a 1, and a 0 must follow a 2. This example is an illustration of the ways in which a Markov chain and the very structure of the subshift of finite type affect the measures of entropy. We see that topological entropy measures the exponential growth rate of the number of allowable words in a system. This is an invariant defined on a topological system. Shannon s entropy on the other hand is highly dependent on the measure associated with the measure-theoretic system. It also measures the average amount of information of elements of a partition. Measuretheoretic entropy is defined using an invariant measure, and gives a measure 36

39 of the average amount of information production that the system is capable of generating. Given the foundation of subshifts of finite type built upon Markov chains, we would like to explore more general dynamical systems. According to the variational principle, topological entropy of a system is equal to the supremum over all invariant measures of measure-theoretic entropy. The theory which develops this result seems highly interesting, but is beyond the scope of this paper. Future work would include learning more of this theory, and exploring other types of dynamical systems including substitutions. 37

40 Bibliography [1] N.P. Fogg and V. Berthé. Substitutions in Dynamics, Arithmetics and Combinatorics. Lecture Notes in Mathematics. Springer, isbn: url: [2] B.P. Kitchens. Symbolic Dynamics: One-sided, Two-sided and Countable State Markov Shifts. Universitext. Springer Berlin Heidelberg, isbn: url: LZb6CAAAQBAJ. [3] J.R. Norris. Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics no Cambridge University Press, isbn: url: [4] Daniel J. Rudolph. Fundamentals of measurable dynamics. Ergodic theory on Lebesgue spaces. Oxford: Clarendon Press, [5] O. Sarig. Lecture Notes on Ergodic Theory url: math.psu.edu/sarig/506/ergodicnotes.pdf. 38

41 [6] P. Walters. An Introduction to Ergodic Theory. Graduate Texts in Mathematics. Springer New York, isbn: url: https: //books.google.com/books?id=ecoufop7onmc. 39

### Markov Chains and Stochastic Sampling

Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

### MAGIC010 Ergodic Theory Lecture Entropy

7. Entropy 7. Introduction A natural question in mathematics is the so-called isomorphism problem : when are two mathematical objects of the same class the same (in some appropriately defined sense of

### INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE

INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE BEN CALL Abstract. In this paper, we introduce the rudiments of ergodic theory and entropy necessary to study Rudolph s partial solution to the 2 3 problem

### 4. Ergodicity and mixing

4. Ergodicity and mixing 4. Introduction In the previous lecture we defined what is meant by an invariant measure. In this lecture, we define what is meant by an ergodic measure. The primary motivation

### Ergodic Properties of Markov Processes

Ergodic Properties of Markov Processes March 9, 2006 Martin Hairer Lecture given at The University of Warwick in Spring 2006 1 Introduction Markov processes describe the time-evolution of random systems

### Lecture 4. Entropy and Markov Chains

preliminary version : Not for diffusion Lecture 4. Entropy and Markov Chains The most important numerical invariant related to the orbit growth in topological dynamical systems is topological entropy.

### Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

### Dynamical Systems and Ergodic Theory PhD Exam Spring Topics: Topological Dynamics Definitions and basic results about the following for maps and

Dynamical Systems and Ergodic Theory PhD Exam Spring 2012 Introduction: This is the first year for the Dynamical Systems and Ergodic Theory PhD exam. As such the material on the exam will conform fairly

### Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

### Lecture 16 Symbolic dynamics.

Lecture 16 Symbolic dynamics. 1 Symbolic dynamics. The full shift on two letters and the Baker s transformation. 2 Shifts of finite type. 3 Directed multigraphs. 4 The zeta function. 5 Topological entropy.

### Ergodic Theory. Constantine Caramanis. May 6, 1999

Ergodic Theory Constantine Caramanis ay 6, 1999 1 Introduction Ergodic theory involves the study of transformations on measure spaces. Interchanging the words measurable function and probability density

### S-adic sequences A bridge between dynamics, arithmetic, and geometry

S-adic sequences A bridge between dynamics, arithmetic, and geometry J. M. Thuswaldner (joint work with P. Arnoux, V. Berthé, M. Minervino, and W. Steiner) Marseille, November 2017 PART 3 S-adic Rauzy

### ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS 1. Cardinal number of a set The cardinal number (or simply cardinal) of a set is a generalization of the concept of the number of elements

### 642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section

### Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities

### Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

### Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

### Entropy for zero-temperature limits of Gibbs-equilibrium states for countable-alphabet subshifts of finite type

Entropy for zero-temperature limits of Gibbs-equilibrium states for countable-alphabet subshifts of finite type I. D. Morris August 22, 2006 Abstract Let Σ A be a finitely primitive subshift of finite

### On the mathematical background of Google PageRank algorithm

Working Paper Series Department of Economics University of Verona On the mathematical background of Google PageRank algorithm Alberto Peretti, Alberto Roveda WP Number: 25 December 2014 ISSN: 2036-2919

### Lecture 20 : Markov Chains

CSCI 3560 Probability and Computing Instructor: Bogdan Chlebus Lecture 0 : Markov Chains We consider stochastic processes. A process represents a system that evolves through incremental changes called

### MATHS 730 FC Lecture Notes March 5, Introduction

1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

### INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

### Lecture 5: Random Walks and Markov Chain

Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables

### Probability Theory. Richard F. Bass

Probability Theory Richard F. Bass ii c Copyright 2014 Richard F. Bass Contents 1 Basic notions 1 1.1 A few definitions from measure theory............. 1 1.2 Definitions............................. 2

### Measure and integration

Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

### Overview of Markovian maps

Overview of Markovian maps Mike Boyle University of Maryland BIRS October 2007 *these slides are online, via Google: Mike Boyle Outline of the talk I. Subshift background 1. Subshifts 2. (Sliding) block

### Stochastic Processes

Stochastic Processes 8.445 MIT, fall 20 Mid Term Exam Solutions October 27, 20 Your Name: Alberto De Sole Exercise Max Grade Grade 5 5 2 5 5 3 5 5 4 5 5 5 5 5 6 5 5 Total 30 30 Problem :. True / False

### Chapter 2: Markov Chains and Queues in Discrete Time

Chapter 2: Markov Chains and Queues in Discrete Time L. Breuer University of Kent 1 Definition Let X n with n N 0 denote random variables on a discrete space E. The sequence X = (X n : n N 0 ) is called

### Sample Spaces, Random Variables

Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

### Chapter 16 focused on decision making in the face of uncertainty about one future

9 C H A P T E R Markov Chains Chapter 6 focused on decision making in the face of uncertainty about one future event (learning the true state of nature). However, some decisions need to take into account

### AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

### Measure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India

Measure and Integration: Concepts, Examples and Exercises INDER K. RANA Indian Institute of Technology Bombay India Department of Mathematics, Indian Institute of Technology, Bombay, Powai, Mumbai 400076,

### Stochastic Processes (Stochastik II)

Stochastic Processes (Stochastik II) Lecture Notes Zakhar Kabluchko University of Ulm Institute of Stochastics L A TEX-version: Judith Schmidt Vorwort Dies ist ein unvollständiges Skript zur Vorlesung

### Math Homework 5 Solutions

Math 45 - Homework 5 Solutions. Exercise.3., textbook. The stochastic matrix for the gambler problem has the following form, where the states are ordered as (,, 4, 6, 8, ): P = The corresponding diagram

### Discrete time Markov chains. Discrete Time Markov Chains, Definition and classification. Probability axioms and first results

Discrete time Markov chains Discrete Time Markov Chains, Definition and classification 1 1 Applied Mathematics and Computer Science 02407 Stochastic Processes 1, September 5 2017 Today: Short recap of

### Lebesgue measure on R is just one of many important measures in mathematics. In these notes we introduce the general framework for measures.

Measures In General Lebesgue measure on R is just one of many important measures in mathematics. In these notes we introduce the general framework for measures. Definition: σ-algebra Let X be a set. A

### Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

### In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

### Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

MARKOV CHAINS What I will talk about in class is pretty close to Durrett Chapter 5 sections 1-5. We stick to the countable state case, except where otherwise mentioned. Lecture 7. We can regard (p(i, j))

### Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

### Matching Polynomials of Graphs

Spectral Graph Theory Lecture 25 Matching Polynomials of Graphs Daniel A Spielman December 7, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class The notes

### The Matrix-Tree Theorem

The Matrix-Tree Theorem Christopher Eur March 22, 2015 Abstract: We give a brief introduction to graph theory in light of linear algebra. Our results culminates in the proof of Matrix-Tree Theorem. 1 Preliminaries

### Language Acquisition and Parameters: Part II

Language Acquisition and Parameters: Part II Matilde Marcolli CS0: Mathematical and Computational Linguistics Winter 205 Transition Matrices in the Markov Chain Model absorbing states correspond to local

### Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

aliprantis.tex May 10, 2011 Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide Notes from [AB2]. 1 Odds and Ends 2 Topology 2.1 Topological spaces Example. (2.2) A semimetric = triangle

### Markov Chains for Everybody

Markov Chains for Everybody An Introduction to the theory of discrete time Markov chains on countable state spaces. Wilhelm Huisinga, & Eike Meerbach Fachbereich Mathematik und Informatik Freien Universität

### Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X

### General Notation. Exercises and Problems

Exercises and Problems The text contains both Exercises and Problems. The exercises are incorporated into the development of the theory in each section. Additional Problems appear at the end of most sections.

### (x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

STABILITY OF EQUILIBRIA AND LIAPUNOV FUNCTIONS. By topological properties in general we mean qualitative geometric properties (of subsets of R n or of functions in R n ), that is, those that don t depend

### Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution

### REVERSALS ON SFT S. 1. Introduction and preliminaries

Trends in Mathematics Information Center for Mathematical Sciences Volume 7, Number 2, December, 2004, Pages 119 125 REVERSALS ON SFT S JUNGSEOB LEE Abstract. Reversals of topological dynamical systems

### A Study of Probability and Ergodic theory with applications to Dynamical Systems

A Study of Probability and Ergodic theory with applications to Dynamical Systems James Polsinelli, Advisors: Prof. Barry James and Bruce Peckham May 0, 010 Contents 1 Introduction I began this project

### Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities

### 1.2. Markov Chains. Before we define Markov process, we must define stochastic processes.

1. LECTURE 1: APRIL 3, 2012 1.1. Motivating Remarks: Differential Equations. In the deterministic world, a standard tool used for modeling the evolution of a system is a differential equation. Such an

### Link Analysis. Leonid E. Zhukov

Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

### 1.3 Convergence of Regular Markov Chains

Markov Chains and Random Walks on Graphs 3 Applying the same argument to A T, which has the same λ 0 as A, yields the row sum bounds Corollary 0 Let P 0 be the transition matrix of a regular Markov chain

### CHAPTER I THE RIESZ REPRESENTATION THEOREM

CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals

Midterm 2: Format Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten

### Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

### Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

### ERGODIC DYNAMICAL SYSTEMS CONJUGATE TO THEIR COMPOSITION SQUARES. 0. Introduction

Acta Math. Univ. Comenianae Vol. LXXI, 2(2002), pp. 201 210 201 ERGODIC DYNAMICAL SYSTEMS CONJUGATE TO THEIR COMPOSITION SQUARES G. R. GOODSON Abstract. We investigate the question of when an ergodic automorphism

### Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Markov Chains X(t) is a Markov Process if, for arbitrary times t 1 < t 2

### Multi Stage Queuing Model in Level Dependent Quasi Birth Death Process

International Journal of Statistics and Systems ISSN 973-2675 Volume 12, Number 2 (217, pp. 293-31 Research India Publications http://www.ripublication.com Multi Stage Queuing Model in Level Dependent

### Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

### Refined Inertia of Matrix Patterns

Electronic Journal of Linear Algebra Volume 32 Volume 32 (2017) Article 24 2017 Refined Inertia of Matrix Patterns Kevin N. Vander Meulen Redeemer University College, kvanderm@redeemer.ca Jonathan Earl

### Markov Chains on Countable State Space

Markov Chains on Countable State Space 1 Markov Chains Introduction 1. Consider a discrete time Markov chain {X i, i = 1, 2,...} that takes values on a countable (finite or infinite) set S = {x 1, x 2,...},

### DYNAMICAL SYSTEMS PROBLEMS. asgor/ (1) Which of the following maps are topologically transitive (minimal,

DYNAMICAL SYSTEMS PROBLEMS http://www.math.uci.edu/ asgor/ (1) Which of the following maps are topologically transitive (minimal, topologically mixing)? identity map on a circle; irrational rotation of

### 1 Lyapunov theory of stability

M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

### 1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

### Coin tossing space. 0,1 consisting of all sequences (t n ) n N, represents the set of possible outcomes of tossing a coin infinitely many times.

Coin tossing space Think of a coin toss as a random choice from the two element set }. Thus the set } n represents the set of possible outcomes of n coin tosses, and Ω := } N, consisting of all sequences

### Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence

Section 2.4 1 Section Summary Sequences. Examples: Geometric Progression, Arithmetic Progression Recurrence Relations Example: Fibonacci Sequence Summations 2 Introduction Sequences are ordered lists of

### Functional Analysis Review

Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

### MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2

### Lecture Notes 1: Vector spaces

Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

### Markov Chain Model for ALOHA protocol

Markov Chain Model for ALOHA protocol Laila Daniel and Krishnan Narayanan April 22, 2012 Outline of the talk A Markov chain (MC) model for Slotted ALOHA Basic properties of Discrete-time Markov Chain Stability

### Lecture 3: graph theory

CONTENTS 1 BASIC NOTIONS Lecture 3: graph theory Sonia Martínez October 15, 2014 Abstract The notion of graph is at the core of cooperative control. Essentially, it allows us to model the interaction topology

### Measure Theory. John K. Hunter. Department of Mathematics, University of California at Davis

Measure Theory John K. Hunter Department of Mathematics, University of California at Davis Abstract. These are some brief notes on measure theory, concentrating on Lebesgue measure on R n. Some missing

### Geometric intuition: from Hölder spaces to the Calderón-Zygmund estimate

Geometric intuition: from Hölder spaces to the Calderón-Zygmund estimate A survey of Lihe Wang s paper Michael Snarski December 5, 22 Contents Hölder spaces. Control on functions......................................2

### CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

### 1/2 1/2 1/4 1/4 8 1/2 1/2 1/2 1/2 8 1/2 6 P =

/ 7 8 / / / /4 4 5 / /4 / 8 / 6 P = 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Andrei Andreevich Markov (856 9) In Example. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P (n) = 0

### MARKOV CHAINS AND MIXING TIMES

MARKOV CHAINS AND MIXING TIMES BEAU DABBS Abstract. This paper introduces the idea of a Markov chain, a random process which is independent of all states but its current one. We analyse some basic properties

### Discrete time Markov chains

Chapter Discrete time Markov chains In this course we consider a class of stochastic processes called Markov chains. The course is roughly equally divided between discrete-time and continuous-time Markov

### Lecture 12: Detailed balance and Eigenfunction methods

Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),

### Spectral radius, symmetric and positive matrices

Spectral radius, symmetric and positive matrices Zdeněk Dvořák April 28, 2016 1 Spectral radius Definition 1. The spectral radius of a square matrix A is ρ(a) = max{ λ : λ is an eigenvalue of A}. For an

### Model Counting for Logical Theories

Model Counting for Logical Theories Wednesday Dmitry Chistikov Rayna Dimitrova Department of Computer Science University of Oxford, UK Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern

### Homework set 3 - Solutions

Homework set 3 - Solutions Math 495 Renato Feres Problems 1. (Text, Exercise 1.13, page 38.) Consider the Markov chain described in Exercise 1.1: The Smiths receive the paper every morning and place it

### Some Background Material

Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

### MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

MATH 5310.001 & MATH 5320.001 FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING 2016 Scientia Imperii Decus et Tutamen 1 Robert R. Kallman University of North Texas Department of Mathematics 1155

### TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis

TCOM 50: Networking Theory & Fundamentals Lecture 6 February 9, 003 Prof. Yannis A. Korilis 6- Topics Time-Reversal of Markov Chains Reversibility Truncating a Reversible Markov Chain Burke s Theorem Queues

### 6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that.

6.2 Fubini s Theorem Theorem 6.2.1. (Fubini s theorem - first form) Let (, A, µ) and (, B, ν) be complete σ-finite measure spaces. Let C = A B. Then for each µ ν- measurable set C C the section x C is

### BOWEN S ENTROPY-CONJUGACY CONJECTURE IS TRUE UP TO FINITE INDEX

BOWEN S ENTROPY-CONJUGACY CONJECTURE IS TRUE UP TO FINITE INDEX MIKE BOYLE, JÉRÔME BUZZI, AND KEVIN MCGOFF Abstract. For a topological dynamical system (X, f), consisting of a continuous map f : X X, and

### Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015

Markov Chains Arnoldo Frigessi Bernd Heidergott November 4, 2015 1 Introduction Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance,

### REAL AND COMPLEX ANALYSIS

REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

### STA 294: Stochastic Processes & Bayesian Nonparametrics

MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

### The Theory behind PageRank

The Theory behind PageRank Mauro Sozio Telecom ParisTech May 21, 2014 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 1 / 19 A Crash Course on Discrete Probability Events and Probability

### RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

Chapter 1 Probability Theory: Introduction Basic Probability General In a probability space (Ω, Σ, P), the set Ω is the set of all possible outcomes of a probability experiment. Mathematically, Ω is just

### Countable and uncountable sets. Matrices.

Lecture 11 Countable and uncountable sets. Matrices. Instructor: Kangil Kim (CSE) E-mail: kikim01@konkuk.ac.kr Tel. : 02-450-3493 Room : New Milenium Bldg. 1103 Lab : New Engineering Bldg. 1202 Next topic:

### REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

### Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Lebesgue Measure The idea of the Lebesgue integral is to first define a measure on subsets of R. That is, we wish to assign a number m(s to each subset S of R, representing the total length that S takes