Entropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders)

Similar documents
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Difference Equations

Linear Approximation with Regularization and Moving Least Squares

Convergence of random processes

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

6. Stochastic processes (2)

6. Stochastic processes (2)

Lecture 3: Shannon s Theorem

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Continuous Time Markov Chains

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

APPENDIX A Some Linear Algebra

Chapter 7 Channel Capacity and Coding

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Density matrix. c α (t)φ α (q)

Google PageRank with Stochastic Matrix

Chapter 7 Channel Capacity and Coding

Random Walks on Digraphs

= z 20 z n. (k 20) + 4 z k = 4

More metrics on cartesian products

Lecture Space-Bounded Derandomization

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

1 GSW Iterative Techniques for y = Ax

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Digital Signal Processing

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Linear Regression Analysis: Terminology and Notation

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

EEE 241: Linear Systems

COS 521: Advanced Algorithms Game Theory and Linear Programming

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Composite Hypotheses testing

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Lecture 12: Discrete Laplacian

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Dynamic Systems on Graphs

EGR 544 Communication Theory

Dynamical Systems and Information Theory

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Channel Encoder. Channel. Figure 7.1: Communication system

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Lecture 5 Decoding Binary BCH Codes

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

FINITE-STATE MARKOV CHAINS

Maximizing the number of nonnegative subsets

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Classes of States and Stationary Distributions

NUMERICAL DIFFERENTIATION

CHAPTER III Neural Networks as Associative Memory

2.3 Nilpotent endomorphisms

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Errors for Linear Systems

Formulas for the Determinant

Affine transformations and convexity

Laboratory 3: Method of Least Squares

Quantum Mechanics for Scientists and Engineers. David Miller

PARTICIPATION FACTOR IN MODAL ANALYSIS OF POWER SYSTEMS STABILITY

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

Laboratory 1c: Method of Least Squares

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Kernel Methods and SVMs Extension

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

CHAPTER 14 GENERAL PERTURBATION THEORY

Foundations of Arithmetic

Continuous Time Markov Chain

1 Generating functions, continued

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Finding Dense Subgraphs in G(n, 1/2)

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Lecture Notes on Linear Regression

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

The Feynman path integral

Subset Topological Spaces and Kakutani s Theorem

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Lecture 3: Probability Distributions

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Georgia Tech PHYS 6124 Mathematical Methods of Physics I

7. Products and matrix elements

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Solution Thermodynamics

Markov chains. Definition of a CTMC: [2, page 381] is a continuous time, discrete value random process such that for an infinitesimal

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

8.592J: Solutions for Assignment 7 Spring 2005

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

MMA and GCMMA two methods for nonlinear optimization

Problem Set 9 Solutions

Transcription:

Entropy of Marov Informaton Sources and Capacty of Dscrete Input Constraned Channels (from Immn, Codng Technques for Dgtal Recorders). Entropy of Marov Chans We have already ntroduced the noton of entropy n a conceptually smple stuaton: t was assumed that the symbols are ndependent and occur wth fxed probabltes. That s, the occurrence of a specfc symbol at a certan nstant does not alter the probablty of occurrences of symbols durng any other symbol ntervals. We need to extend the concept of entropy for more complcated structures where symbols are not chosen ndependently but ther probabltes of occurrng depend on precedng symbols. It s to be emphaszed that nearly all practcal sources emt sequences of symbols that are statstcally dependent. Sequences formed by the Englsh language are an excellent example. Occurrence of the letter Q mples that the letter to follow s probably a U. Regardless of the form of the statstcal dependence, or structure, among the successve source outputs, the effect s that the amount of nformaton comng out of such a source s smaller than from a source emttng the same set of characters n ndependent sequences. The development of a model for sources wth memory s the focus of the ensung dscusson. In probablty theory the notaton P r (A B) means the probablty of occurrence of event A gven that ( ) event B has occurred. Many of the structures that wll be encountered n the subsequent chapters can usefully be modeled n terms of a Marov chan. A Marov chan s a specal type of stochastc process dstngushed by a certan Marov property A (dscrete) Marov chan s defned as a dscrete random process of the form where the varables Z t are dependent dscrete random varables tang values n the state alphabet S= {s,, s N }, and the dependence satsfes the Marov condton In words, the varable Z t s ndependent of past samples Z t-2,z t-3... f the value of Z t- s nown. A (homogeneous) Marov chan can be descrbed by a transton probablty matrx Q wth elements The transton probablty matrx Q s a stochastc matrx, that s, ts entres are nonnegatve, and the entres of each row sum to one. Any stochastc matrx consttutes a vald transton probablty matrx. Imagne the process starts at tme t = by choosng an ntal state n accordance wth a specfed probablty dstrbuton. If we are n state s at tme t =, then the

process moves at t = 2 to a possbly new state, the quantty q s the probablty that the process wll move to state s at tme t = 2. If we are n state s at nstant t = 2, we move to s at nstant t = 3 wth probablty q. Ths procedure s repeated ad nfntum. The state-transton dagram of a Marov chan, portrayed n the followng fgure (a) represents a Marov chan as a drected graph where the states are emboded by the nodes or vertces of the graph; the transton between states s represented by a drected lne, an edge, from the ntal to the fnal state, The transton probabltes q correspondng to varous transtons are shown mared along the lnes of the graph. Another useful representaton of a Marov chan s provded by a trells (or lattce) dagram (see (b)). Alternatve representatons of a three-state Marov chan. (a) state-transton dagram, (b) trells dagram for the same chan. Permssble transtons from one state to the other are depcted by lnes. Ths s a state dagram augmented by a tme axs so that t provdes for easy vsualzaton of how the states change wth tme. In theory, there are many types of Marov chans; here we restrct our attenton to chans that are ergodc and regular. Roughly speang, ergodcty means that from any state the chan can eventually reach any other state; regularty means that the Marov chan s non- perodc. In the practcal structures that wll be encountered, all these condtons hold. We shall now tae a closer loo at the dynamcs of a Marov chan, To that end, ( t) let w Pr( Z = ) represent the probablty of beng n state at tme t. Clearly, t N = The probablty of beng n state nstant t-: w ( t) = at tme t may be expressed n the state probabltes at The prevous equaton suggests the use of matrces. If we ntroduce the state dstrbuton vector

then the prevous equaton can succnctly be expressed n an elegant matrx/vector notaton, thus By teraton we obtan In other words, the state dstrbuton vector at tme t s the product of the state dstrbuton vector at tme t =, and the (t -)th power of the transton matrx. It s easy to see that Q t- s also a stochastc matrx. The prevous formula (3.0) s equvalent to the asserton that the n-step transton matrx s the nth power of the sngle step transton matrx Q. We note also that Q 0 = I s the ordnary dentty matrx. We shall concentrate now on the lmtng behavor of the state dstrbuton vector as t. In many cases of practcal nterest there s only one such lmtng dstrbuton vector, denoted by π = ( π,, π N ). In the long run the state dstrbuton vector converges to ths equlbrum dstrbuton vector from any vald ntal state probablty vector w () so The numberπ, s called the steady, or statonary state probablty of state. The equlbrum dstrbuton vector can be obtaned by solvng the system of lnear equatons n the N unnowns π,,π : N π Q = π Only N- of these N equatons are ndependent, so we solve the top N- along wth the normalzng condton N = π = The proof s elementary: we note that f π Q = π, then Decomposton of the ntal state vector w () n terms of the egenvectors of Q can be convenent to demonstrate the process of convergence. The matrx Q has N egenvalues { λ,,λ N }, that can be found by solvng the characterstc equaton where I s the dentty matrx, and N (left) egenvectors {u,..., u N } each of whch s a soluton of the system Provded that λ, =,., N, are dstnct, there are N ndependent egenvectors, and the egenvectors λ, =,, N, consttute a bass. The ntal state vector may be wrtten as

We fnd the state dstrbuton vector w (t) at nstant t: If t s assumed that the egenvalues are dstnct, the {? } can be ordered, such that λ > λ >, etc. Combnaton of prevous formulae reveals that p s an egenvector 2 λ3 wth unty egenvalue, thus λ =, We then have and convergence to p s assured snce λ <,. 2. Entropy of Marov Informaton Sources We are now n the poston to descrbe a Marov nformaton source. Gven a fnte Marov chan {Z t } and a functon? whose doman s the set of states of the chan and whose range s a fnte set G, the source alphabet, then the sequence {X t } where X t = ζ ( Z t ), s sad to be the output of a Marov nformaton source correspondng to the chan { Z t } and the functon?. In general, the number of states can be larger than the cardnalty of the source alphabet, whch means that one output symbol may correspond to more than one state. The essental feature of the Marov nformaton source s that t provdes for dependence between successve symbols, whch ntroduces redundancy n the message sequence. Each symbol conveys less nformaton than t s capable of conveyng snce t s to some extent predctable from the precedng symbol. In the foregong descrpton of an nformaton source we assumed that the symbol emtted s solely a functon of the state that s entered. Ths type of descrpton s usually called a Moore-type Marov source, In a dfferent descrpton, called the Mealy-type Marov source, the symbols emtted are a functon of the Marov chan X ˆ t = ζ ( Zt, Zt+ ). In other words, a Mealy-type Marov source s obtaned by labellng the edges of the drected graph that represents the Marov chan. Mealy- and Moore-type descrptons are equvalent. Let a Mealy-type machne be gven. By defnng a Marov nformaton source wth state set composed of trples { ( ), ζ ˆ, } and label ζˆ ( ) on the state { ( ), ζ ˆ, }, we obtan a Moore-type Marov source. The Moore-type model s referred to as the edge graph of the Mealy-type model. An example of a Mealy-type nformaton source and ts Moore-type equvalent are shown n the followng fgure.

(a) Example of a Mealy-type two-state Marov nformaton source, and (b) ts four-state Moore-type counterpart The dea of a Marov source has enabled us to represent certan types of structure n streams of data. We next examne the nformaton content, or entropy, of a sequence emtted by a Marov source, The entropy of a Marov nformaton source s hard to compute n most cases. For a certan class of Marov nformaton sources, termed unflar Marov nformaton source, the computaton may be greatly smplfed. The word unflar refers to the followng property. Let a Marov nformaton source wth a set of states Σ = { } alphabet G, and assocated output functon ( Z t ),, 2, N, output ζ be gven. For each state Σ, let n be the states that can be reached n one step from such that q > 0. We say s a successor of, that s, the states, f q >0. The source s sad to be unflar f for each state the symbols ζ ( ),, ζ ( n ) are dstnct. In other words, each successor of must be assocated wth a dstnct symbol. Provded ths condton s met and the ntal state of the Marov nformaton source s nown, the sequence of emtted symbols determnes the sequence of states followed by the chan, and a smple formula s avalable for the entropy of the emtted X-process. Gven a unflar Marov source, as above, let,, be the successors of n, then t s qute natural to defne the uncertanty of state as H H ( p,, pn ) =, wth H (,, ) H( p,, p ) = p log p M 2 = M p p n defned as Shannon defned the entropy of the unflar Marov source as the average of these l weghed n accordance wth the steady-state probablty of beng n a state n queston, that s, by the expresson Note that we use the notaton H{X }to express the fact that we are consderng the entropy of sequences {X} and not the functon H(.). The next numercal example may serve to llustrate the theory. Example: Consder the three-state unflar Marov chan depcted n the fgure above. From the dagram we may read the transton probablty matrx

What s the average probablty of beng n one of the three states? We fnd the followng system of lnear equatons that govern the steady-state probabltes: 4 7 from whch we obtan π 3 = π 2 and π 3 = 6π 2. Snce π + π2 + π3 = we have 3 The entropy of the nformaton source s found to be equal In the next secton we consder a problem whch s central to the feld of nputconstraned channels. We focus on methods to compute the maxmum amount of nformaton that can be sent over an nput-constraned channel per unt of tme. 3. Capacty of Dscrete Noseless Channels Shannon defned the capacty C of a dscrete noseless channel by where N(T) s the number of admssble sgnals of duraton T. The problem of calculaton of the capacty for constraned channels s n essence a combnatoral problem, that s, fndng the number of allowed sequences N(T). Ths fundamental defnton wll be wored out n a moment for some specfc channel models. We start, snce vrtually all channel constrants can be modeled as such, wth the computaton of the capacty of Marov nformaton sources. 3.. Capacty of Marov nformaton sources In the prevous sectons we developed a measure of nformaton content of an nformaton source that can be represented by a fnte Marov model. As dscussed, the measure of nformaton content, entropy, can be expressed n terms of the lmtng statetranston probabltes and the condtonal entropy of the states. In ths secton we address a problem that provdes the ey to answer many questons that wll emerge n the chapters to fo llow. Gven a unflar N-state Marov source wth states {, N } and

transton probabltes follows. Let d d qˆ, we defne the connecton matrx D = { } = f = 0 f qˆ qˆ > 0 = 0,, =,, N. d of the source as The actual values of the transton probabltes are rrelevant; the connecton (or adacency) matrx contans bnary-valued elements, and t s formed by replacng the postve elements of the transton matrx by l s. For an N-state source, the connecton matrx D s defned by d = f a transton from state to state s allowable and d = 0 otherwse. For a gven connecton matrx, we wsh to choose the transton probabltes n such a way that the entropy H N { X } = π = H s maxmzed. Such a source s called maxentropc, and sequences generated by a maxentropc unflar source are called maxentropc sequences. The maxmum entropy of a unflar Marov nformaton source, gven ts connecton matrx, s gven by { } = log, C = max H X λ 2 max where? max s the largest egenvalue of the connecton matrx D. The exstence of a postve egenvalue and correspondng egenvector wth postve elements s guaranteed by the Perron-Frobenus theorems. Essentally, there are two approaches to prove the precedng equaton. One approach, provded by Shannon, s a straghtforward routne, usng Lagrange multplers, of fndng the extreme value of a functon of several ndependent varables. The second proof of the above formula to be followed here, s establshed by enumeratng the number of dstnct sequences that a Marov source can generate. The number of dstnct sequences of length m +, m > 0, emanatng from state s denoted by N (m + ), equals the total of the numbers of sequences of unty length that emerge from s, and termnate n s multpled (snce the source s unflar) by the number of sequences of length m that start n s. Thus we fnd the followng recurrence equaton Ths s a system of N lnear homogeneous dfference equatons wth constant coeffcents, and therefore the soluton s a lnear combnaton of exponentals lambda m. To fnd the partcular {? }, we assume a soluton of the form N (m) = y *lambda m to obtan or, lettng y T = (y,,y N ), where the superscrpt T stands for transposton, we have lambda*y =Dy

Thus the allowable {lambda } are the egenvalues of the matrx D. For large sequence length n we may approxmate N (m) by where a s a constant ndependent of m and lambda max s the largest real egenvalue of the matrx D, or n other words, lambda max s the largest real root of the determnant equaton Prevoous equaton states that for large enough m the number of dstnct sequences grows exponentally wth the sequence length m; the growth factor s lambda max. Ths s not to say that N (m) s accurately determned by the exponental term when m s small. We have The maxmum entropy of the noseless channel may be evaluated by nvong the defnton of the capacty, or The transton probabltes q assocated wth the maxmum entropy of the source can be found wth the followng reasonng. Let p = (p, P N ) T denote the egenvector assocated wth the egenvalue lambda max or The state-transton probabltes that maxmze the entropy are To prove the above formula s a matter of substtuton. Accordng to the Perron- Frobenus theorems [5], the components of the egenvector p are postve, and thus q = 0, =, = N. Snce p = (p,.., p N ) T s an egenvector for? max we conclude and hence the matrx Q s ndeed stochastc. The entropy of a Marov nformaton source s, accordng to defnton where H s the uncertanty of state s and (p,...,p N ) s the steady-state dstrbuton. Thus,

Snce and we obtan Ths demonstrates that the transton probabltes gven by are ndeed maxmzng the entropy. Example (Contnued): We revert to the three-state unflar Marov chan wth transton probablty matrx The adacency matrx D s The characterstc equaton s from whch we conclude that the largest root? max = 2, and the capacty s C = log 2? max =. The egenvector assocated wth the largest egenvalue s p = (, 3, 2) T. The transton probabltes that maxmze the entropy of the Marov nformaton source are found wth