Entropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders)

Entropy of Marov Informaton Sources and Capacty of Dscrete Input Constraned Channels (from Immn, Codng Technques for Dgtal Recorders). Entropy of Marov Chans We have already ntroduced the noton of entropy n a conceptually smple stuaton: t was assumed that the symbols are ndependent and occur wth fxed probabltes. That s, the occurrence of a specfc symbol at a certan nstant does not alter the probablty of occurrences of symbols durng any other symbol ntervals. We need to extend the concept of entropy for more complcated structures where symbols are not chosen ndependently but ther probabltes of occurrng depend on precedng symbols. It s to be emphaszed that nearly all practcal sources emt sequences of symbols that are statstcally dependent. Sequences formed by the Englsh language are an excellent example. Occurrence of the letter Q mples that the letter to follow s probably a U. Regardless of the form of the statstcal dependence, or structure, among the successve source outputs, the effect s that the amount of nformaton comng out of such a source s smaller than from a source emttng the same set of characters n ndependent sequences. The development of a model for sources wth memory s the focus of the ensung dscusson. In probablty theory the notaton P r (A B) means the probablty of occurrence of event A gven that ( ) event B has occurred. Many of the structures that wll be encountered n the subsequent chapters can usefully be modeled n terms of a Marov chan. A Marov chan s a specal type of stochastc process dstngushed by a certan Marov property A (dscrete) Marov chan s defned as a dscrete random process of the form where the varables Z t are dependent dscrete random varables tang values n the state alphabet S= {s,, s N }, and the dependence satsfes the Marov condton In words, the varable Z t s ndependent of past samples Z t-2,z t-3... f the value of Z t- s nown. A (homogeneous) Marov chan can be descrbed by a transton probablty matrx Q wth elements The transton probablty matrx Q s a stochastc matrx, that s, ts entres are nonnegatve, and the entres of each row sum to one. Any stochastc matrx consttutes a vald transton probablty matrx. Imagne the process starts at tme t = by choosng an ntal state n accordance wth a specfed probablty dstrbuton. If we are n state s at tme t =, then the

process moves at t = 2 to a possbly new state, the quantty q s the probablty that the process wll move to state s at tme t = 2. If we are n state s at nstant t = 2, we move to s at nstant t = 3 wth probablty q. Ths procedure s repeated ad nfntum. The state-transton dagram of a Marov chan, portrayed n the followng fgure (a) represents a Marov chan as a drected graph where the states are emboded by the nodes or vertces of the graph; the transton between states s represented by a drected lne, an edge, from the ntal to the fnal state, The transton probabltes q correspondng to varous transtons are shown mared along the lnes of the graph. Another useful representaton of a Marov chan s provded by a trells (or lattce) dagram (see (b)). Alternatve representatons of a three-state Marov chan. (a) state-transton dagram, (b) trells dagram for the same chan. Permssble transtons from one state to the other are depcted by lnes. Ths s a state dagram augmented by a tme axs so that t provdes for easy vsualzaton of how the states change wth tme. In theory, there are many types of Marov chans; here we restrct our attenton to chans that are ergodc and regular. Roughly speang, ergodcty means that from any state the chan can eventually reach any other state; regularty means that the Marov chan s non- perodc. In the practcal structures that wll be encountered, all these condtons hold. We shall now tae a closer loo at the dynamcs of a Marov chan, To that end, ( t) let w Pr( Z = ) represent the probablty of beng n state at tme t. Clearly, t N = The probablty of beng n state nstant t-: w ( t) = at tme t may be expressed n the state probabltes at The prevous equaton suggests the use of matrces. If we ntroduce the state dstrbuton vector

then the prevous equaton can succnctly be expressed n an elegant matrx/vector notaton, thus By teraton we obtan In other words, the state dstrbuton vector at tme t s the product of the state dstrbuton vector at tme t =, and the (t -)th power of the transton matrx. It s easy to see that Q t- s also a stochastc matrx. The prevous formula (3.0) s equvalent to the asserton that the n-step transton matrx s the nth power of the sngle step transton matrx Q. We note also that Q 0 = I s the ordnary dentty matrx. We shall concentrate now on the lmtng behavor of the state dstrbuton vector as t. In many cases of practcal nterest there s only one such lmtng dstrbuton vector, denoted by π = ( π,, π N ). In the long run the state dstrbuton vector converges to ths equlbrum dstrbuton vector from any vald ntal state probablty vector w () so The numberπ, s called the steady, or statonary state probablty of state. The equlbrum dstrbuton vector can be obtaned by solvng the system of lnear equatons n the N unnowns π,,π : N π Q = π Only N- of these N equatons are ndependent, so we solve the top N- along wth the normalzng condton N = π = The proof s elementary: we note that f π Q = π, then Decomposton of the ntal state vector w () n terms of the egenvectors of Q can be convenent to demonstrate the process of convergence. The matrx Q has N egenvalues { λ,,λ N }, that can be found by solvng the characterstc equaton where I s the dentty matrx, and N (left) egenvectors {u,..., u N } each of whch s a soluton of the system Provded that λ, =,., N, are dstnct, there are N ndependent egenvectors, and the egenvectors λ, =,, N, consttute a bass. The ntal state vector may be wrtten as

We fnd the state dstrbuton vector w (t) at nstant t: If t s assumed that the egenvalues are dstnct, the {? } can be ordered, such that λ > λ >, etc. Combnaton of prevous formulae reveals that p s an egenvector 2 λ3 wth unty egenvalue, thus λ =, We then have and convergence to p s assured snce λ <,. 2. Entropy of Marov Informaton Sources We are now n the poston to descrbe a Marov nformaton source. Gven a fnte Marov chan {Z t } and a functon? whose doman s the set of states of the chan and whose range s a fnte set G, the source alphabet, then the sequence {X t } where X t = ζ ( Z t ), s sad to be the output of a Marov nformaton source correspondng to the chan { Z t } and the functon?. In general, the number of states can be larger than the cardnalty of the source alphabet, whch means that one output symbol may correspond to more than one state. The essental feature of the Marov nformaton source s that t provdes for dependence between successve symbols, whch ntroduces redundancy n the message sequence. Each symbol conveys less nformaton than t s capable of conveyng snce t s to some extent predctable from the precedng symbol. In the foregong descrpton of an nformaton source we assumed that the symbol emtted s solely a functon of the state that s entered. Ths type of descrpton s usually called a Moore-type Marov source, In a dfferent descrpton, called the Mealy-type Marov source, the symbols emtted are a functon of the Marov chan X ˆ t = ζ ( Zt, Zt+ ). In other words, a Mealy-type Marov source s obtaned by labellng the edges of the drected graph that represents the Marov chan. Mealy- and Moore-type descrptons are equvalent. Let a Mealy-type machne be gven. By defnng a Marov nformaton source wth state set composed of trples { ( ), ζ ˆ, } and label ζˆ ( ) on the state { ( ), ζ ˆ, }, we obtan a Moore-type Marov source. The Moore-type model s referred to as the edge graph of the Mealy-type model. An example of a Mealy-type nformaton source and ts Moore-type equvalent are shown n the followng fgure.

(a) Example of a Mealy-type two-state Marov nformaton source, and (b) ts four-state Moore-type counterpart The dea of a Marov source has enabled us to represent certan types of structure n streams of data. We next examne the nformaton content, or entropy, of a sequence emtted by a Marov source, The entropy of a Marov nformaton source s hard to compute n most cases. For a certan class of Marov nformaton sources, termed unflar Marov nformaton source, the computaton may be greatly smplfed. The word unflar refers to the followng property. Let a Marov nformaton source wth a set of states Σ = { } alphabet G, and assocated output functon ( Z t ),, 2, N, output ζ be gven. For each state Σ, let n be the states that can be reached n one step from such that q > 0. We say s a successor of, that s, the states, f q >0. The source s sad to be unflar f for each state the symbols ζ ( ),, ζ ( n ) are dstnct. In other words, each successor of must be assocated wth a dstnct symbol. Provded ths condton s met and the ntal state of the Marov nformaton source s nown, the sequence of emtted symbols determnes the sequence of states followed by the chan, and a smple formula s avalable for the entropy of the emtted X-process. Gven a unflar Marov source, as above, let,, be the successors of n, then t s qute natural to defne the uncertanty of state as H H ( p,, pn ) =, wth H (,, ) H( p,, p ) = p log p M 2 = M p p n defned as Shannon defned the entropy of the unflar Marov source as the average of these l weghed n accordance wth the steady-state probablty of beng n a state n queston, that s, by the expresson Note that we use the notaton H{X }to express the fact that we are consderng the entropy of sequences {X} and not the functon H(.). The next numercal example may serve to llustrate the theory. Example: Consder the three-state unflar Marov chan depcted n the fgure above. From the dagram we may read the transton probablty matrx

What s the average probablty of beng n one of the three states? We fnd the followng system of lnear equatons that govern the steady-state probabltes: 4 7 from whch we obtan π 3 = π 2 and π 3 = 6π 2. Snce π + π2 + π3 = we have 3 The entropy of the nformaton source s found to be equal In the next secton we consder a problem whch s central to the feld of nputconstraned channels. We focus on methods to compute the maxmum amount of nformaton that can be sent over an nput-constraned channel per unt of tme. 3. Capacty of Dscrete Noseless Channels Shannon defned the capacty C of a dscrete noseless channel by where N(T) s the number of admssble sgnals of duraton T. The problem of calculaton of the capacty for constraned channels s n essence a combnatoral problem, that s, fndng the number of allowed sequences N(T). Ths fundamental defnton wll be wored out n a moment for some specfc channel models. We start, snce vrtually all channel constrants can be modeled as such, wth the computaton of the capacty of Marov nformaton sources. 3.. Capacty of Marov nformaton sources In the prevous sectons we developed a measure of nformaton content of an nformaton source that can be represented by a fnte Marov model. As dscussed, the measure of nformaton content, entropy, can be expressed n terms of the lmtng statetranston probabltes and the condtonal entropy of the states. In ths secton we address a problem that provdes the ey to answer many questons that wll emerge n the chapters to fo llow. Gven a unflar N-state Marov source wth states {, N } and

transton probabltes follows. Let d d qˆ, we defne the connecton matrx D = { } = f = 0 f qˆ qˆ > 0 = 0,, =,, N. d of the source as The actual values of the transton probabltes are rrelevant; the connecton (or adacency) matrx contans bnary-valued elements, and t s formed by replacng the postve elements of the transton matrx by l s. For an N-state source, the connecton matrx D s defned by d = f a transton from state to state s allowable and d = 0 otherwse. For a gven connecton matrx, we wsh to choose the transton probabltes n such a way that the entropy H N { X } = π = H s maxmzed. Such a source s called maxentropc, and sequences generated by a maxentropc unflar source are called maxentropc sequences. The maxmum entropy of a unflar Marov nformaton source, gven ts connecton matrx, s gven by { } = log, C = max H X λ 2 max where? max s the largest egenvalue of the connecton matrx D. The exstence of a postve egenvalue and correspondng egenvector wth postve elements s guaranteed by the Perron-Frobenus theorems. Essentally, there are two approaches to prove the precedng equaton. One approach, provded by Shannon, s a straghtforward routne, usng Lagrange multplers, of fndng the extreme value of a functon of several ndependent varables. The second proof of the above formula to be followed here, s establshed by enumeratng the number of dstnct sequences that a Marov source can generate. The number of dstnct sequences of length m +, m > 0, emanatng from state s denoted by N (m + ), equals the total of the numbers of sequences of unty length that emerge from s, and termnate n s multpled (snce the source s unflar) by the number of sequences of length m that start n s. Thus we fnd the followng recurrence equaton Ths s a system of N lnear homogeneous dfference equatons wth constant coeffcents, and therefore the soluton s a lnear combnaton of exponentals lambda m. To fnd the partcular {? }, we assume a soluton of the form N (m) = y *lambda m to obtan or, lettng y T = (y,,y N ), where the superscrpt T stands for transposton, we have lambda*y =Dy

Thus the allowable {lambda } are the egenvalues of the matrx D. For large sequence length n we may approxmate N (m) by where a s a constant ndependent of m and lambda max s the largest real egenvalue of the matrx D, or n other words, lambda max s the largest real root of the determnant equaton Prevoous equaton states that for large enough m the number of dstnct sequences grows exponentally wth the sequence length m; the growth factor s lambda max. Ths s not to say that N (m) s accurately determned by the exponental term when m s small. We have The maxmum entropy of the noseless channel may be evaluated by nvong the defnton of the capacty, or The transton probabltes q assocated wth the maxmum entropy of the source can be found wth the followng reasonng. Let p = (p, P N ) T denote the egenvector assocated wth the egenvalue lambda max or The state-transton probabltes that maxmze the entropy are To prove the above formula s a matter of substtuton. Accordng to the Perron- Frobenus theorems [5], the components of the egenvector p are postve, and thus q = 0, =, = N. Snce p = (p,.., p N ) T s an egenvector for? max we conclude and hence the matrx Q s ndeed stochastc. The entropy of a Marov nformaton source s, accordng to defnton where H s the uncertanty of state s and (p,...,p N ) s the steady-state dstrbuton. Thus,

Snce and we obtan Ths demonstrates that the transton probabltes gven by are ndeed maxmzng the entropy. Example (Contnued): We revert to the three-state unflar Marov chan wth transton probablty matrx The adacency matrx D s The characterstc equaton s from whch we conclude that the largest root? max = 2, and the capacty s C = log 2? max =. The egenvector assocated wth the largest egenvalue s p = (, 3, 2) T. The transton probabltes that maxmze the entropy of the Marov nformaton source are found wth