Shannon Meets Lyapunov: Connections between Information Theory and Dynamical Systems
|
|
- Lewis Johnston
- 6 years ago
- Views:
Transcription
1 Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December -5, 005 MoC.3 Shannon Meets Lyapunov: Connections between Information Theory and Dynamical Systems Tim Holliday Peter Glynn Andrea Goldsmith Princeton/Bell Labs Stanford University Stanford University Abstract This paper explores connections between Information Theory, Lyapunov exponents for products of random matrices, and hidden Markov models. Specifically, we will show that entropies associated with finite-state channels are equivalent to Lyapunov exponents. We use this result to show that the traditional prediction filter for hidden Markov models is not an irreducible Markov chain in our problem framework. Hence, we do not have access to many well-known properties of irreducible continuous state space Markov chains (e.g. a unique and continuous stationary distribution. However, by exploiting the connection between entropy and Lyapunov exponents and applying proof techniques from the theory of random matrix products we can solve a broad class of problems related to capacity and hidden Markov models. Our results provide strong regularity results for the non-irreducible prediction filter as well as some novel theoretical tools to address problems in these areas. I. INTRODUCTION In this paper we explore connections between Shannon entropy and Lyapunov exponents for products of random matrices. Specifically, we will examine some of the unique problems, solution techniques, and insights that arise from the connection between these seemingly disparate bodies of theory. In [0] we focused on using this connection to compute entropy and Lyapunov exponents for finitestate channels. In this paper we will address some of the surprising theoretical problems that arise when examining the connections between these areas. Perhaps most significantly, we show that the standard prediction filter associated with hidden Markov models (HMMs is not an irreducible Markov chain. Hence, much of the standard theory for continuous state space Markov chains cannot be applied to ensure results that are often taken for granted. For example, lack of irreducibility prevents automatic access to a unique stationary distribution, rates of convergence to stationarity, or continuity of the stationary distribution for the HMM prediction filter. We then show that the connection between Lyapunov exponents and Shannon entropy gives us access to a new set of tools that allows to address these non-irreducibility problems. In particular, we show that many convergence results for products of random matrices can be applied to ensure strong convergence results for the HMM prediction filter (even though the filter is not irreducible. Many of these strong convergence results require a fair amount of technical detail for which we will refer to [9]. Our goal in this paper is to summarize the theoretical results and to present some of the tools we use to solve these problems. In Section II we present our first result that shows the symbol entropies associated with a finite state channel are equivalent to the Lyapunov exponents associated with a particular class of random matrix products. Section III presents our second set of results that show the entropy associated with finite-state Markov chains (with no channel state information may be computed as an expectation with respect to the stationary distribution of a particular Markov chain. This Markov chain is closely related to the well-known projective product in chaotic dynamic systems or the prediction filter in HMMs. In Section III.B we show the rather startling conclusion that the prediction filter is typically an extremely poorly behaved process. Specifically, the filter does not satisfy even the weakest form of irreducibility (i.e. Harris recurrence, and possesses infinite memory. In Sections IV and V we present our third set of results to show that we can combine theory from random matrix products and Markov chains to resolve many of the difficulties described above. Specifically, by exploiting the contraction property of positive matrices we are able to show conditions under which the prediction filter has a unique stationary distribution, exponential rates of convergence to steady-state, and continuity of the stationary distribution with respect to the channel input distribution and channel transition probabilities. II. MARKOV CHANNELS WITH ERGODIC INPUTS Consider a communication channel with (channel state sequence C =(C n : n 0, input symbol sequence X = (X n : n 0, and output symbol sequence Y =(Y n : n 0. The channel states take values in C, whereas the input and output symbols take values in X and Y, respectively. In this paper, we shall adopt the notational convention that if s =(s n : n 0 is any generic sequence, then for m, n 0, s m+n m =(s m,...,s m+n denotes the finite segment of s starting at index m and ending at index m + n. A. Channel Model Assumptions In this section (and throughout the rest of this paper, we will assume that: A: C = (C n : n 0 is a stationary finite-state irreducible Markov chain, possessing transition matrix R = (R(c n,c n+ :c n,c n+ C. In particular, n P (C0 n = c n 0 =r(c 0 R(c j,c j /05/$ IEEE 756 Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
2 for c n 0 C, where r =(r(c :c C is the unique stationary distribution of C. the Lyapunov exponent literature, the following result is of central importance. A: The input/output symbol pairs {(X i,y i :i 0} are conditionally independent given C, so that n P (X0 n = x n 0,Y0 n = y0 n C = P (X i = x i,y i = y i C for x n 0 X n+,y n 0 Y n+. i=0 A3: For each pair (c 0,c C, there exists a probability mass function q( c 0,c on X x Y such that P (X i = x, Y i = y C =q(x, y C i,c i+. The non-causal dependence of the symbols is introduced strictly for mathematical convenience. It is clear that typical causal channel models fit into this framework. A number of important channel models are subsumed by A-A3, in particular channels with ISI, dependent inputs, or any other finite-memory chain (see [9] for some specific examples. B. Entropy as a Lyapunov Exponent Let the stationary distribution of the channel be represented as a row vector r =(r(c :c C, and let e be a column vector in which every entry is equal to one. For x X and y Y, let G (X,Y (x,y = (G (X,Y (x,y (c 0,c :c 0,c C be a C x C matrix with entries given by G (X,Y (x,y (c 0,c = R(c 0,c q(x, y c 0,c. Observe that P (X0 n = x n 0,Y0 n = y0 n = n r(c 0 R(c j,c j+ q(x j,y j c j,c j+ c 0,...,c n+ = c 0,...,c n+ r(c 0 n G (X,Y (x j,y j (c j,c j+ = rg (X,Y (x 0,y 0 GX,Y (x,y G(X,Y (x e. n,y n Taking logarithms, dividing by n, and letting n we conclude that H(X, Y = lim n n ElogP (Xn 0,Y0 n = λ(x, Y, ( where λ(x, Y = lim n n E log(rg(x,y (X 0,Y 0 G(X,Y (X n,y n e. ( The quantity λ(x, Y is known as the largest Lyapunov exponent (or, simply, Lyapunov exponent associated with the sequence of random matrix products ( G (X,Y (X 0,Y 0 G(X,Y (X,Y G(X,Y (X : n 0 n,y n. Let be any matrix norm for which A A A A for any two matrices A and A. Within 757 Theorem : Let (B n : n 0 be a stationary ergodic sequence of random matrices for which E log(max( B 0, <. Then, there exists a deterministic constant λ (known as the Lyapunov exponent such that n log B B B n λ a.s. (3 as n. Furthermore, λ = lim n n Elog B B n (4 = inf n n Elog B B n. (5 The standard proof of Theorem is based on the sub-additive ergodic theorem due to Kingman []. Note that for A =max{ c A(c 0,c : c 0 C}, min r(c G (X 0,Y c C 0G (X,Y G (Xn,Y n rg (X0,Y 0G (X,Y G (Xn,Y ne G (X0,Y 0G (X,Y G (Xn,Y n. The positivity of r therefore guarantees that n E log(rg (X 0,Y 0G (X,Y G (Xn,Y ne (6 n Elog G (X 0,Y 0G (X,Y G (Xn,Y n 0 as n, so that the existence of the limit in ( may be deduced either from information theory (Shannon-McMillan- Breiman theorem or from random matrix theory (Theorem. With the channel model described by A-A3, each of the entropies H(X, H(Y, and H(X, Y turn out to be Lyapunov exponents for products of random matrices (up to a change in sign. Proposition : For x X and y Y, let G X x = (G X x (c 0,c :c 0,c C, G Y y =(G Y y (c 0,c :c 0,c C, and G (X,Y (x,y =(G (X,Y (x,y (c 0,c :c 0,c C be C x C matrices with entries given by G X x (c 0,c = R(c 0,c q(x, y c 0,c, y G Y y (c 0,c = R(c 0,c q(x, y c o,c, x G (X,Y (x,y (c 0,c = R(c 0,c q(x, y c 0,c. Assume A-A3. Then H(X = λ(x, H(Y = λ(y, and H(X, Y = λ(x, Y, where λ(x, λ(y, and λ(x, Y are the Lyapunov exponents defined as the following limits: λ(x = lim n n log GX X G X X n a.s., λ(y = lim n n log GY Y G Y Y n a.s., λ(x, Y = lim log G(X,Y n n (X,Y G(X,Y (X n,y n a.s. Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
3 The proof of the above proposition is virtually identical to the argument of Theorem, and is therefore omitted. C. Entropy, Lyapunov Exponents, and Typical Sequences At this point it is useful to provide a bit of intuition regarding the connection between Lyapunov exponents and entropy. Following the above development we can write P (X,...,X n =rg X X G X X G X X n e, where r is the stationary distribution for the channel C. Using Proposition and (6 we can interpret the Lyapunov exponent λ X as the average exponential rate of growth for a random dynamic system that describes the probability of the sequence X. Since P (X,...,X n 0 as n for any non-trivial sequence, the rate of growth will be negative. (If the probability of the input sequence does not converge to zero then H(X =0. This view of the Lyapunov exponent facilitates a straightforward information theoretic interpretation based on the notion of typical sequences. From Cover and Thomas [3], the typical set A n ɛ is the set of sequences x,...,x n satisfying nh(x+ɛ P (X = x,...,x n = x n nh(x ɛ and P (A n ɛ > ɛ for n sufficiently large. Hence we can see that, asymptotically, any observed sequence must be a typical sequence with high probability. Furthermore, the asymptotic exponential rate of growth of the probability for any typical sequence must be H(X or λ(x. This probability growth rate intuition will be useful in understanding the results presented in the next section where we show that λ(x can also be viewed as an expectation rather than an asymptotic quantity. III. A MARKOV CHAIN REPRESENTATION FOR LYAPUNOV EXPONENTS We will now show that the Lyapunov exponents of interest in this paper can also be represented as expectations with respect to the stationary distributions for a particular class of Markov chains. From this point onward, we will focus our attention on the Lyapunov exponent λ(x, since the conclusions for λ(y and λ(x, Y are analogous. In much of the literature on Lyapunov exponents for i.i.d. products of random matrices, the basic theoretical tool for analysis is a particular continuous state space Markov chain [7]. Since our matrices are not i.i.d. we will use a slightly modified version of this Markov chain, namely ( wg X Z n = X G X X n wg X X G X X n,c n,c n+ = ( p n,c n,c n+. Here, w is a C -dimensional stochastic (row vector, and the norm appearing in the definition of Z n is any norm on R C. If we view wg X X G X X n as a vector, then we can interpret the first component of Z as the direction of the vector at time n. The second and third components of Z determine the probability distribution of the random matrix that will be applied at time n. We choose the normalized direction vector p n = wgx X G X X n wg X X G X X n rather than the vector itself because wg X X G X X n 0 as n, but we expect some sort of non-trivial steady-state behavior for the normalized version. The steady-state theory for Markov chains on continuous state space, while technically sophisticated, is a highly developed area of probability. The Markov chain Z allows one to potentially apply this set of tools to the analysis of the Lyapunov exponent λ(x. Assuming for the moment that Z has a steady-state Z, we can then expect to find that Z n =( p n,c n,c n+ Z =( p,c, C (7 as n, where C, C C, p 0 = w and wg X p n = X G X X n wg X X G X X n = p n G X X n p n G X (8 X n for n. Ifw is positive, the same argument as that leading to (6 shows that n log GX X G X X n n log wgx X G X X n 0a.s. (9 as n, which implies λ(x = lim n n log wgx X G X X n. Furthermore, it is easily verified that n log wg X X G X X n = log( p j G X X j. (0 j= Relations (9 and (0 together guarantee that n λ(x = lim log( p j G X X n n j a.s. ( j= In view of (7, this suggests that λ(x = x X E log( p G X x R(C, C q(x C, C ( where q(x c 0,c = y q(x, y c 0,c. Recall the above discussion regarding the intuitive interpretation of Lyapunov exponents and entropy and suppose we apply the -norm, given by w = c w(c, in (. Then the representation ( computes the expected exponential rate of growth for the probability P (X,...,X n, where the expectation is with respect to the stationary distribution of the continuous state space Markov chain Z. Thus, assuming the validity of (7 (it turns out that existence of a stationary distribution is not automatic, computing the Lyapunov exponent effectively amounts to computing the stationary distribution of the Markov chain Z. Note that while ( holds for any choice of norm, the -norm provides the most intuitive interpretation 758 Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
4 A. The Connection to Hidden Markov Models As noted above, Z is a Markov chain regardless of the choice of norm on R C. If we specialize to the -norm and assume that p 0 = r, it turns out that the first component of Z can be viewed as the standard prediction filter for the channel given the input symbol sequence X, and this prediction filter is itself Markov. We state these results in the following propositions that are proved in [9]. Proposition : Assume A-A3, and let w = r, the stationary distribution of the channel C. Then, for n 0 and c C, p n (c =P (C n+ = c X n. (3 Proposition 3 Assume A-A3 and suppose w = r. Then, the sequence p =( p n : n 0 is a Markov chain taking values in the continuous state space P = {w : w 0, w =}. Furthermore, p n G X x = P (X n+ = x X n. (4 In view of Proposition 3, the terms appearing in the sum (0 have interpretations as conditional entropies, namely E log( p j G X X j =H(X j+ X j, so that the formula ( for λ(x can be interpreted as the well known representation for H(X in terms of the averaged conditional entropies; H(X = lim n n H(X,...,X n = n lim H(X j+ X j n n n = lim E log( p j G X X n n j = λ(x In addition, an expected value representation of H(X, similar in spirit to (, is a well known result in the hidden Markov model literature [5]. Note, however, that the analysis of the hidden Markov prediction filter ( p n : n 0 with w = r is only a special case of the problem we consider here. First, the above conditional entropy interpretation of log( p j G X X j holds only when we choose to use the - norm. Moreover, the above interpretations also require that we initialize p with p 0 = r, the stationary distribution of the channel C (i.e. Proposition 3 does not hold. Hence, if we want to use an arbitrary initial vector we must use the multivariate process Z, which is always a Markov chain. B. Pathologies of the Prediction Filter It turns out that the prediction filter p n with w = r and the general process Z can be extremely ill-behaved Markov chains. In general, these filter processes are NOT irreducible in even the weakest sense (i.e. Harris recurrence. [See [3] for the theory of Harris chains]. The key condition required to show that a Markov chain is Harris recurrent is the notion of φ-irreducibility. Consider the Markov chain Z defined on the space P x C x C with Borel sets B(P x C x C. Define τ A as the first return time to the set A Px C x C. Then, the Markov chain Z is φ-irreducible if there exists a nontrivial measure φ on B(P x C x C such that for every state z Px C x C φ(a > 0 P z (τ A < > 0. (5 However, the Markov chain Z is never irreducible, as illustrated by the following example. Suppose that the output symbol process Y is binary, hence the random matrices G Y Y n can only take two values, say G Y 0 and G Y corresponding to output symbols 0 and, respectively. Suppose we initialize p 0 = r and examine the possible values for p r n. Notice that for any n, the random vector p r n can take on only a finite number of values, where each possible value is determined by one of the n-length permutations of the matrices G Y 0 and G Y, and the initial condition p 0. One can easily find another initial vector belonging to P, call it w r for which the support of the corresponding p w n s are disjoint from the support for the p r n s for all n 0. This contradicts (5. Hence, the Markov chain Z has infinite memory and is not irreducible. This technical difficulty means that we cannot apply the vast number of results available from the standard theory of Harris recurrent Markov chains. For example, we cannot automatically state the Markov chain Z has a steady-state distribution; the rate of convergence to steady-state; 3 if the steady-state distribution is unique; 4 if the steady-state distribution and entropy are continuous functions of the input symbol distribution and channel transition probabilities. All of the above issues are critical if we wish to use the filter Markov chain in proof techniques and for the purposes of computing entropy and mutual information. In the following sections we will address the above problems by exploiting the connection between Lyapunov exponents, products of random matrices, and Shannon entropy. Before we move on, we should note that the authors of [] point out an important exception to this irreducibility problem for the case of ISI channels with Gaussian noise. When Gaussian noise is added to the output symbols the random matrix G Y Y n is selected from a continuous population. In this case the Markov chain Z is in fact irreducible and standard theory applies. However, since we wish to consider any finite state channel, including those with finite symbol sets, we cannot appeal to existing Harris chain theory. IV. COMPUTING THE LYAPUNOV EXPONENT AS AN EXPECTATION In the previous section we showed that the Lyapunov exponent λ(x can be directly computed as an expectation with respect to the stationary distribution of the Markov chain Z. However, in order to make this statement rigorous we must first prove that Z in fact has a stationary distribution. 759 Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
5 Furthermore, we should also determine if the stationary distribution for Z is unique. We address the rate of convergence and continuity issues in Section V. As it turns out, the Markov chain Z with Z n = ( p n,c n,c n+ is a very cumbersome theoretical tool for analyzing many properties of Lyapunov exponents. The main difficulty is that we must carry around the extra augmenting variables (C n,c n+ in order to make Z a Markov chain. Unfortunately, we cannot utilize the channel prediction filter p alone since it is only a Markov chain when p 0 = r. In order to prove properties such as existence and uniqueness of a stationary distribution for a Markov chain, we must be able to characterize the Markov chain s behavior for any initial point. In this section we introduce a new Markov chain p, which we will refer to as the P-chain. It is closely related to the prediction filter p and, in some cases, will be identical to the prediction filter. However, the Markov chain p possess one important additional property it is always a Markov chain regardless of its initial point. The reason for introducing this new Markov chain is that the asymptotic properties of p are the same as those of the prediction filter p (we show this in Section V, and the analysis of p is substantially easier than that of Z. Therefore the results we are about to prove for p can be applied to p and hence the Lyapunov exponent λ(x. A. The Channel P-chain We will define the random evolution of the P-chain using the following algorithm Algorithm A: Initialize n =0and p 0 = w P, where P = {w : w 0, w =} Generate X X from the probability mass function ( p n G X x : x X. 3 Set p n+ = pngx X p ng X X. 4 Set n = n + and return to. The output produced by Algorithm A clearly exhibits the Markov property, for any initial vector w P. Let p w = (p w n : n 0 denote the output of Algorithm A when p 0 = w. Proposition 3 proves that for w = r, p r coincides with the sequence p r =( p r n : n 0, where p w =( p w n : n 0 for w Pis defined by the recursion (also known as the forward Baum equation p w n = wgx X G X X n wg X X G X, (6 X n where X =(X n : n is a stationary version of the input symbol sequence. Note that in the above algorithm the symbol sequence X is determined in an unconventional fashion. In a traditional filtering problem the symbol sequence X follows an exogenous random process and the channel state predictor uses the observed symbols to update the prediction vector. However, in Algorithm A the probability distribution of the symbol Xn depends on the random vector p n, hence the symbol sequence X is not an exogenous process. Rather, the symbols are generated according to a probability distribution determined by the state of the P- chain. Proposition 3 establishes a relationship between the prediction filter p w and the P-chain p w when w = r. As noted above, we shall need to study the relationship for arbitrary w P. Proposition 4 provides the key link (see [9] for a proof. Proposition 4: Assume A-A3. Then, if w P, p w n = wgx X (w GX X n(w wg X X (w GX X (7 n(w where X(w = (X n (w : n is the input symbol sequence when C is sampled from the mass function w. In particular, P (X (w =x,...,x n (w =x n =wg X x G X x n e. (8 Indeed, Proposition 4 is critical to the remaining analysis in this paper and therefore warrants careful examination. In Algorithm A the probability distribution of the symbol X n depends on the state of the Markov chain p w n. This dependence makes it difficult to explicitly determine the joint probability distribution for the symbol sequence X,..., X n. Proposition 4 shows that we can take an alternative view of the P-chain. Rather than generating the P-chain with an endogenous sequence of symbols X,..., X n, we can use the exogenous sequence X (w,...,x n (w, where the sequence X(w =(X n (w :n is the input sequence generated when the channel is initialized with the probability mass function w. In other words, we can view the chain p w n as being generated by a stationary channel C, whereas the P-chain p w n is generated by a non-stationary version of the channel, C(w, using w as the initial channel distribution. Hence, the input symbol sequences for the Markov chains p w and p w can be generated by two different versions of the same Markov chain (i.e. the channel. In Section V we will use this critical property (along with some results on products of random matrices to show that the asymptotic behaviors of p w and p w are identical. The stochastic sequence p w is the prediction filter that arises in the study of hidden Markov models. As is natural in the filtering theory context, the filter p w is driven by the exogenously determined observations X. On the other hand, it appears that p w has no obvious filtering interpretation, except when w = r. However, for the reasons discussed above, p w is the more appropriate object for us to study. As is common in the Markov chain literature, we shall frequently choose to suppress the dependence on w, choosing to denote the Markov chain as p =(p n : n 0. B. The Lyapunov Exponent as an Expectation Our goal now is to analyze the steady-state behavior of the Markov chain p and show that the Lyapunov exponent can be computed as an expectation with respect to p s stationary 760 Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
6 distribution. In particular, if p has a stationary distribution we should expect H(X = x X E log( p G X x p G X x, (9 where p is a random vector distributed according to p s stationary distribution. Theorem : Assume A-A3 and let P + = {w P: w(c > 0,c C}. Then, For any stationary distribution π of p =(p n : n 0, H(X log( wg X x wg X x π(dw. x X P For any stationary distribution π satisfying π(p + =, H(X = log( wg X x wg X x π(dw. x X P Proof: See [9] for details. The result follows from a straightforward application of Birkhoff s ergodic theorem and the Monotone Convergence Theorem. Note that Theorem suggests that p =(p n : n 0 may have multiple stationary distributions. The following example shows that this may indeed occur, even in the presence of A-A3. Example 5: Suppose C = {, }, and X = {, }, with ( R = (0 and G X = ( 0 0 G X = ( 0 0. ( Then, both π and π are stationary distributions for p, where and π ((, = ( π ((0, = π ((, 0 =. (3 Theorem leaves open the possibility that stationary distributions with support on the boundary of P will fail to satisfy (9. Furstenberg and Kifer [7] discuss the behavior of p =(p n : n 0 when p has multiple stationary distributions, some of which violate (9 (under an invertibility hypotheses on the G X x s. Theorem also fails to resolve the question of existence of a stationary distribution for p. To remedy this situation we impose additional hypotheses: A4: X < and Y <. A5: For each (x, y for which P (X 0 = x, Y 0 = y > 0, the matrix G (X,Y (x,y is row-allowable (i.e. it has no row in which every entry is zero. Theorem 3 Assume A-A5. Then, p = (p n : n 0 possesses a stationary distribution π. Proof: See [9] for details. The row allowability of G (X,Y (x,y allows us to show that p is a Feller chain on a compact space, and therefore it possesses a stationary distribution. As we shall see in the next section, much more can be said about the channel P-chain p =(p n : n 0 in the presence of strong positivity hypotheses on the matrices {G X x : x X}. The Markov chain p =(p n : n 0, as studied in this section, is challenging largely because we permit a great deal of sparsity in the matrices {G X x : x X}. The challenges we face here are largely driven by the inherent complexity of the behavior that Lyapunov exponents can exhibit in the presence of such sparsity. For example, Peres [4], [5], provides examples of discontinuity and lack of smoothness in the Lyapunov exponent as a function of the input symbol distribution when the random matrices have a sparsity structure like that of this section. These examples suggest strongly that entropy can be discontinuous in the presence of A-A5. We will alleviate these problems in the next section through additional assumptions on the aperiodicity of the channel as well as the conditional probability distributions on the input/output symbols. V. THE STATIONARY DISTRIBUTION OF THE CHANNEL P-CHAIN UNDER POSITIVITY CONDITIONS In this section we introduce extra conditions that guarantee the existence of a unique stationary distribution for the Markov chains p and p r. By necessity, the discussion in this section is rather technical. Hence we will first summarize the results of this section and then explain further details. The key assumption we will make in this section is that the probability of observing any symbol pair (x, y is strictly positive for any valid channel transition (i.e. if R(c 0,c is positive recall that the probability mass function for the input/output symbols q(x, y c 0,c depends on the channel transition rather than just the channel state. This assumption, together with aperiodicity of R, will guarantee that the random matrix product G X X (w GX X n(w can be split into a product of strictly positive random matrices. We then exploit the fact that strictly positive matrices are strict contractions on P + = {w P: w(c > 0,c C}for an appropriate distance metric. This contraction property allows us to show that both the prediction filter p r and the P-chain p converge exponentially fast to the same limiting random variable. Hence, both p and p r have the same unique stationary distribution that we can use to compute the Lyapunov exponent λ(x. This result is stated formally in Theorem 5. In Theorem 6 we show that λ(x is a continuous function of both the transition matrix R and the symbol probabilities q(x, y c 0,c. A. The Contraction Property of Positive Matrices We assume here that: A6: The transition matrix R is aperiodic. 76 Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
7 A7: For each (c 0,c,x,y C x X x Y, q(x, y c 0,c > 0 whenever R(c 0,c > 0. Under A6-A7, all the matrices {G X x,g Y y,g (X,Y (x,y : x X,y Y}exhibit the same (aperiodic sparsity pattern as R. That is, the matrices have the same pattern of zero and non-zero elements. Note that under A and A6, R l is strictly positive for some finite value of l. So, G X X (j l+ G X X jl (4 is strictly positive for j 0. The key mathematical property that we shall now repeatedly exploit is the fact that positive matrices are contracting on P + in a certain sense. For v, w P +, let ( maxc (v(c/w(c d(v, w =log. (5 min c (v(c/w(c The distance d(v, w is called Hilbert s projective distance between v and w, and is a metric on P + ; see page 90 of Seneta [6]. For any non-negative matrix T, let where θ(t = τ(t = max c 0,c,c,c 3 θ(t /, (6 +θ(t / ( T (c0,c 3 T (c,c 4. (7 T (c 0,c 4 T (c,c 3 Note that τ(t < if T is strictly positive (i.e. if all the elements of T are strictly positive. Theorem 4: Suppose v, w P + are row vectors. Then, if T is strictly positive, d(vt, wt τ(t d(v, w. (8 For a proof, see pages 00-0 of Seneta [6]. The quantity τ(t is called Birkhoff s contraction coefficient. Our first application of this idea is to establish that the asymptotic behavior of the channel P-chain p and the prediction filter p coincide. Note that for n l, p r n and p w n both lie in P +,sod( p r n, p w n is well-defined for n l. Proposition 5 will allow us to show that p w =( p w n : n 0 has a unique stationary distribution. Proposition 6 will allow us to show that p w =(p w n : n 0 must have the same stationary distribution as p w. Proposition 5: Assume A-A4 and A6-A7. If w P, then d( p r n, p w n =O(e αn a.s. as n, where α = (log β/l and β =max{τ(g X x G X x l :p(x = x,...,x l = x l > 0}. Proposition 6: Assume B-B4 and B6-B7. For w P, there exists a probability space upon which d(p w n, p r n=o(e αn a.s. as n. The proofs of Propositions 5 and 6 rely on Proposition 4 and a coupling argument that we will summarize here (see [9] for the details. Recall from Proposition 4 that we can view p r n and p w n as being generated by a stationary and nonstationary version of the channel C, respectively. The key idea is that along each sample path the non-stationary version of the channel will eventually couple with the stationary version. Once the channels couple then the non-stationary version of the symbol sequence X(w will also couple with the stationary version X. When this coupling occurs, say at time T <, the symbol sequences (X n (w :n>t and (X n : n>t will be identical. This means that for all n>t the matrices applied to p r n and p w n will also be identical. This allows us to apply the contraction result from Theorem 4 and complete the proofs. Note that without the contraction property of positive matrices we would not be able to prove this convergence argument due to the infinite memory problem discussed earlier. B. A Unique Stationary Distribution for the Prediction Filter and the P-Chain We will now show that there exists a limiting random variable p such that p r n p as n. In view of Propositions 5 and 6, this will ensure that for each w P, p w n p as n. To prove this result, we will use an idea borrowed from the theory of random iterated functions ; see Diaconis and Freedman [4]. We leave the technical details of the argument for [9] and present a summary of the proof technique here. Let X =(X n : <n< be a doubly-infinite stationary version of the input symbol sequence, and put χ n = X n for n Z. Then, rg X X G X X n D = rg X χn G X χ, (9 where D = denotes equality in distribution. Put p 0 = r and p n = rgx χ n G X χ rg X χ n G X χ, for n 0. The process (p n : n 0 is a time-reversed version of p r, such that p r D n = p n for all n 0. Take careful note of the order in which matrices are applied to p n (they are applied in reverse. Hence, the matrix observed at time n is applied in front of the matrix products rather than at the end (as is the case with p r n. Then, if we look at consecutive values of p n we can show that it is a.s. a Cauchy sequence (through the contraction property. Hence, there exists a random variable p such that p n p a.s. as n. This guarantees weak convergence of the forward process p r to a random variable with the same distribution as p. This reversal argument is remarkably useful since we have no means of directly showing that the forward process converges. In [9] we provide the detail to prove the following theorem. Theorem 5: Assume A-A4 and A6-A7. Then, 76 Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
8 i. p =(p n : n 0 has a unique stationary distribution π. ii. There exists a set K P + such that π(k =and π( =P (p. iii. For each w P, p w n p as n. iv. K is absorbing for (p ln : n 0, in the sense that P (p w ln K =for n 0 and w K. Applying Theorem, we may conclude that under A- A4 and A6-A7, the channel P-chain has a unique stationary distribution π on P + satisfying H(X = log( wg X x wg X x π(dw. (30 x X P We can also use our Markov chain machinery to establish continuity of the entropy H(X as a function of R and q. Such a continuity result is of theoretical importance in optimizing the mutual information between X and Y, which is necessary to compute channel capacity. The following theorem generalizes a continuity result of [8] obtained in the setting of i.i.d. input symbol sequences. Theorem 6: Assume A-A4 and A6-A7. Suppose that (R n : n is a sequence of transition matrices on C for which R n R as n. Also, suppose that for n, q n ( c 0,c is a probability mass function on X x Y for each (c 0,c C and that q n q as n. If H n (X is the entropy of X associated with the channel model characterized by (R n,q n, then H n (X H(X as n. Proof: See [9] for the details. With Theorem 5 in hand we need a standard weak convergence argument for the expected value of a continuous function on a compact set. Our final result presents a simple and intuitive connection between the stationary prediction filter p, the Lyapunov exponent (or entropy, and the random matrix process G X X. It turns out, that these three quantities are related through the following random eigenvector-eigenvalue relation: pg X X D = γp, (3 where p = D p D = p, and E π log γ = H(x, where π is the stationary distribution from Theorem 5. Hence, the stationary filter random variable is a random eigenvector for the random matrix G X X. The associated random eigenvalue γ then determines the Lyapunov exponent. A proof of this result follows directly from []. VI. CONCLUSIONS This work examines some of the theoretical issues that arise when investigating the connections between Lyapunov exponents, Shannon entropy, and hidden Markov models. We first demonstrated that entropies for finite state channels are equivalent to Lyapunov exponents for products of random matrices. We then showed that entropy can be expressed as an expectation with respect to the stationary distribution of a Markov chain that is closely related to the hidden Markov prediction filter. As a consequence, we proved that the prediction filter is a non-irreducible Markov chain. By applying tools from the theory of random matrix products we were able to provide new regularity results for the prediction filter, even though the filter Markov chain is not Harris recurrent. The theoretical tools presented here may also be used to compute entropy and mutual information for finite-state Markov channels. The non-irreducibility problems discussed above create many problems in simulation-based algorithms for computation, and we also address these issues in [9]. Finally, we should note that this connection between Lyapunov exponents and Information Theory is just a first step. We now have access to a wide range of tools from statistical mechanics that can be applied to problems in Information Theory and hidden Markov models. Clearly, a significant amount of translation work between the languages of the different fields needs to occur, but the mathematical connections and potential results appear to be promising. Indeed, it might now be possible to fully extend the notion of a Thermodynamic Formalism [] to Shannon s Theory of Information thereby permitting information theoretic interpretations of many quantities in quantum physics and dynamic systems. REFERENCES [] L. Arnold, L. Demetrius, and M. Gundlach, Evolutionary formalism for products of positive random matrices, Annals of Applied Probability 4, pp , 994. [] C. Beck and F. Schlgl, Thermodynamics of Chaotic Systems: An Introduction, Cambridge Nonlinear Science Series, vol. 4, Cambridge UP, 993. [3] T. Cover, J. Thomas, Elements of Information Theory, John Wiley & Sons, 99. [4] P. Diaconis, D.A. Freedman, Iterated random functions, SIAM Review, vol. 4, pp , 999. [5] Y. Ephraim and N. Merhav, Hidden Markov processes, IEEE Trans. Information Theory, vol. 48, pp , June 00. [6] H. Furstenberg, H. Kesten, Products of Random Matrices, Ann. Math. Statist., pp , 960. [7] H. Furstenberg and Y. Kifer, Random matrix products and measures on projective spaces, Isreal J. Math., vol.46, pp.-3, 983. [8] A. Goldsmith, P. Varaiya, Capacity, mutual information, and coding for finite state Markov channels, IEEE Trans. Information Theory, vol. 4, pp , May 996. [9] T. Holliday, P. Glynn, A. Goldsmith, On Entropy and Lyapunov Exponents for Finite-State Channels, submitted to IEEE Trans. on Information Theory, available at [0] T. Holliday, A. Goldsmith, P. Glynn, Capacity of Finite State Markov Channels with General Inputs, IEEE ISIT 003, Yokohama, Japan, 003. [] J. Kingman, Subadditive ergodic theory, Annals of Probability, : , 973. [] F. LeGland, L. Mevel, Exponential Forgetting and Geometric Ergodicity in Hidden Markov Models, Mathematics of Control, Signals and Systems, 3,, pp , 000. [3] S. Meyn, R. Tweedie, Markov Chains and Stochastic Stability, Springer Verlag Press, 994. [4] Y. Peres, Domains of analytic continuation for the top Lyapunov exponent, Ann. Inst. H. Poincar Probab. Statist., vol. 9, pp.3-48, 99. [5] Y. Peres, K. Simon, B. Solomyak, Absolute continuity for random iterated function systems with overlaps, /em preprint /em, 005. [6] E. Seneta, Non negative matrices and Markov chains, Springer- Verlag Press, second edition, Authorized licensed use limited to: Stanford University. Downloaded on July 0,00 at 05:9:09 UTC from IEEE Xplore. Restrictions apply.
On Entropy and Lyapunov Exponents for Finite-State Channels
On Entropy and Lyapunov Exponents for Finite-State Channels Tim Holliday, Peter Glynn, Andrea Goldsmith Stanford University Abstract The Finite-State Markov Channel (FSMC) is a time-varying channel having
More informationIN this work, we develop the theory required to compute
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 8, AUGUST 2006 3509 Capacity of Finite State Channels Based on Lyapunov Exponents of Random Matrices Tim Holliday, Member, IEEE, Andrea Goldsmith,
More informationDistributed Power Control for Time Varying Wireless Networks: Optimality and Convergence
Distributed Power Control for Time Varying Wireless Networks: Optimality and Convergence Tim Holliday, Nick Bambos, Peter Glynn, Andrea Goldsmith Stanford University Abstract This paper presents a new
More informationStochastic Realization of Binary Exchangeable Processes
Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More informationAsymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime
Asymptotic Filtering and Entropy Rate of a Hidden Marov Process in the Rare Transitions Regime Chandra Nair Dept. of Elect. Engg. Stanford University Stanford CA 94305, USA mchandra@stanford.edu Eri Ordentlich
More informationEntropy and Ergodic Theory Notes 21: The entropy rate of a stationary process
Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process 1 Sources with memory In information theory, a stationary stochastic processes pξ n q npz taking values in some finite alphabet
More information(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute
ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationComputation of Information Rates from Finite-State Source/Channel Models
Allerton 2002 Computation of Information Rates from Finite-State Source/Channel Models Dieter Arnold arnold@isi.ee.ethz.ch Hans-Andrea Loeliger loeliger@isi.ee.ethz.ch Pascal O. Vontobel vontobel@isi.ee.ethz.ch
More informationThe Skorokhod reflection problem for functions with discontinuities (contractive case)
The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationEntropy Rate of Stochastic Processes
Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be
More informationShannon meets Wiener II: On MMSE estimation in successive decoding schemes
Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationSymmetric Characterization of Finite State Markov Channels
Symmetric Characterization of Finite State Markov Channels Mohammad Rezaeian Department of Electrical and Electronic Eng. The University of Melbourne Victoria, 31, Australia Email: rezaeian@unimelb.edu.au
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationAn Introduction to Entropy and Subshifts of. Finite Type
An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview
More informationUpper Bounds on the Capacity of Binary Intermittent Communication
Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationAffine iterations on nonnegative vectors
Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider
More informationCONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner
STOCHASTIC STABILITY H.J. Kushner Applied Mathematics, Brown University, Providence, RI, USA. Keywords: stability, stochastic stability, random perturbations, Markov systems, robustness, perturbed systems,
More informationLecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.
Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although
More informationStochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property
Chapter 1: and Markov chains Stochastic processes We study stochastic processes, which are families of random variables describing the evolution of a quantity with time. In some situations, we can treat
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationOn the Sufficiency of Power Control for a Class of Channels with Feedback
On the Sufficiency of Power Control for a Class of Channels with Feedback Desmond S. Lun and Muriel Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,
More informationInformation Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST
Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationINTRODUCTION TO MARKOV CHAIN MONTE CARLO
INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate
More informationPart 2 Continuous functions and their properties
Part 2 Continuous functions and their properties 2.1 Definition Definition A function f is continuous at a R if, and only if, that is lim f (x) = f (a), x a ε > 0, δ > 0, x, x a < δ f (x) f (a) < ε. Notice
More information5 Birkhoff s Ergodic Theorem
5 Birkhoff s Ergodic Theorem Birkhoff s Ergodic Theorem extends the validity of Kolmogorov s strong law to the class of stationary sequences of random variables. Stationary sequences occur naturally even
More informationNon-Gaussian Maximum Entropy Processes
Non-Gaussian Maximum Entropy Processes Georgi N. Boshnakov & Bisher Iqelan First version: 3 April 2007 Research Report No. 3, 2007, Probability and Statistics Group School of Mathematics, The University
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationLecture 2: September 8
CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 2: September 8 Lecturer: Prof. Alistair Sinclair Scribes: Anand Bhaskar and Anindya De Disclaimer: These notes have not been
More informationAsymptotics for Polling Models with Limited Service Policies
Asymptotics for Polling Models with Limited Service Policies Woojin Chang School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205 USA Douglas G. Down Department
More informationLectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes
Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities
More informationGärtner-Ellis Theorem and applications.
Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationStationary Probabilities of Markov Chains with Upper Hessenberg Transition Matrices
Stationary Probabilities of Marov Chains with Upper Hessenberg Transition Matrices Y. Quennel ZHAO Department of Mathematics and Statistics University of Winnipeg Winnipeg, Manitoba CANADA R3B 2E9 Susan
More informationLet (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t
2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationExponential Mixing Properties of Stochastic PDEs Through Asymptotic Coupling
Exponential Mixing Properties of Stochastic PDEs Through Asymptotic Coupling September 14 2001 M. Hairer Département de Physique Théorique Université de Genève 1211 Genève 4 Switzerland E-mail: Martin.Hairer@physics.unige.ch
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationLIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE
International Journal of Applied Mathematics Volume 31 No. 18, 41-49 ISSN: 1311-178 (printed version); ISSN: 1314-86 (on-line version) doi: http://dx.doi.org/1.173/ijam.v31i.6 LIMITING PROBABILITY TRANSITION
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationI R TECHNICAL RESEARCH REPORT. Risk-Sensitive Probability for Markov Chains. by Vahid R. Ramezani and Steven I. Marcus TR
TECHNICAL RESEARCH REPORT Risk-Sensitive Probability for Markov Chains by Vahid R. Ramezani and Steven I. Marcus TR 2002-46 I R INSTITUTE FOR SYSTEMS RESEARCH ISR develops, applies and teaches advanced
More informationEECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have
EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,
More informationInformation Theory: Entropy, Markov Chains, and Huffman Coding
The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc
More informationMarkov Chains, Stochastic Processes, and Matrix Decompositions
Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral
More informationGeneral Glivenko-Cantelli theorems
The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................
More informationarxiv: v1 [cs.it] 5 Sep 2008
1 arxiv:0809.1043v1 [cs.it] 5 Sep 2008 On Unique Decodability Marco Dalai, Riccardo Leonardi Abstract In this paper we propose a revisitation of the topic of unique decodability and of some fundamental
More informationTowards control over fading channels
Towards control over fading channels Paolo Minero, Massimo Franceschetti Advanced Network Science University of California San Diego, CA, USA mail: {minero,massimo}@ucsd.edu Invited Paper) Subhrakanti
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationRegular finite Markov chains with interval probabilities
5th International Symposium on Imprecise Probability: Theories and Applications, Prague, Czech Republic, 2007 Regular finite Markov chains with interval probabilities Damjan Škulj Faculty of Social Sciences
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationSOMEWHAT STOCHASTIC MATRICES
SOMEWHAT STOCHASTIC MATRICES BRANKO ĆURGUS AND ROBERT I. JEWETT Abstract. The standard theorem for stochastic matrices with positive entries is generalized to matrices with no sign restriction on the entries.
More informationSuperposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels
Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Nan Liu and Andrea Goldsmith Department of Electrical Engineering Stanford University, Stanford CA 94305 Email:
More informationMS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),
MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2
More information8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains
8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationStochastic process for macro
Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationOptimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood
Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity
More informationConcavity of mutual information rate of finite-state channels
Title Concavity of mutual information rate of finite-state channels Author(s) Li, Y; Han, G Citation The 213 IEEE International Symposium on Information Theory (ISIT), Istanbul, Turkey, 7-12 July 213 In
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationCapacity of AWGN channels
Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that
More informationMarkov Processes Hamid R. Rabiee
Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationDispersion of the Gilbert-Elliott Channel
Dispersion of the Gilbert-Elliott Channel Yury Polyanskiy Email: ypolyans@princeton.edu H. Vincent Poor Email: poor@princeton.edu Sergio Verdú Email: verdu@princeton.edu Abstract Channel dispersion plays
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationLecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.
1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if
More informationBare-bones outline of eigenvalue theory and the Jordan canonical form
Bare-bones outline of eigenvalue theory and the Jordan canonical form April 3, 2007 N.B.: You should also consult the text/class notes for worked examples. Let F be a field, let V be a finite-dimensional
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationNote that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +
Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n
More informationINTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING
INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More informationMatrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution
1/29 Matrix analytic methods Lecture 1: Structured Markov chains and their stationary distribution Sophie Hautphenne and David Stanford (with thanks to Guy Latouche, U. Brussels and Peter Taylor, U. Melbourne
More informationPrime numbers and Gaussian random walks
Prime numbers and Gaussian random walks K. Bruce Erickson Department of Mathematics University of Washington Seattle, WA 9895-4350 March 24, 205 Introduction Consider a symmetric aperiodic random walk
More informationASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 42, NO 6, JUNE 1997 771 Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach Xiangbo Feng, Kenneth A Loparo, Senior Member, IEEE,
More informationCharacterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems
6332 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 10, OCTOBER 2012 Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear
More informationKesten s power law for difference equations with coefficients induced by chains of infinite order
Kesten s power law for difference equations with coefficients induced by chains of infinite order Arka P. Ghosh 1,2, Diana Hay 1, Vivek Hirpara 1,3, Reza Rastegar 1, Alexander Roitershtein 1,, Ashley Schulteis
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationLecture 12. F o s, (1.1) F t := s>t
Lecture 12 1 Brownian motion: the Markov property Let C := C(0, ), R) be the space of continuous functions mapping from 0, ) to R, in which a Brownian motion (B t ) t 0 almost surely takes its value. Let
More informationFeedback Capacity of a Class of Symmetric Finite-State Markov Channels
Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationGeneral Theory of Large Deviations
Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations
More informationPermutation Excess Entropy and Mutual Information between the Past and Future
Permutation Excess Entropy and Mutual Information between the Past and Future Taichi Haruna 1, Kohei Nakajima 2 1 Department of Earth & Planetary Sciences, Graduate School of Science, Kobe University,
More informationH(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.
LECTURE 2 Information Measures 2. ENTROPY LetXbeadiscreterandomvariableonanalphabetX drawnaccordingtotheprobability mass function (pmf) p() = P(X = ), X, denoted in short as X p(). The uncertainty about
More informationhttp://www.math.uah.edu/stat/markov/.xhtml 1 of 9 7/16/2009 7:20 AM Virtual Laboratories > 16. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 1. A Markov process is a random process in which the future is
More informationCapacity of Block Rayleigh Fading Channels Without CSI
Capacity of Block Rayleigh Fading Channels Without CSI Mainak Chowdhury and Andrea Goldsmith, Fellow, IEEE Department of Electrical Engineering, Stanford University, USA Email: mainakch@stanford.edu, andrea@wsl.stanford.edu
More informationRelative growth of the partial sums of certain random Fibonacci-like sequences
Relative growth of the partial sums of certain random Fibonacci-like sequences Alexander Roitershtein Zirou Zhou January 9, 207; Revised August 8, 207 Abstract We consider certain Fibonacci-like sequences
More informationBoolean Inner-Product Spaces and Boolean Matrices
Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver
More informationPrediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate
www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation
More informationThe Lebesgue Integral
The Lebesgue Integral Brent Nelson In these notes we give an introduction to the Lebesgue integral, assuming only a knowledge of metric spaces and the iemann integral. For more details see [1, Chapters
More information