MARKOV CHAINS A ﬁnite state Markov chain is a sequence of discrete cv s from a ﬁnite alphabet where is a pmf on and for

Size: px

Start display at page:

Download "MARKOV CHAINS A ﬁnite state Markov chain is a sequence of discrete cv s from a ﬁnite alphabet where is a pmf on and for"

Julius Richard
5 years ago
Views:

1 MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1 =s,s n 2 = s n 2...,S 0 =s 0 ) for all choices of s n 2...,s 0, We use the states to represent the memory in a discrete source with memory. 1

2 Markov chain graph has node for each state and directed edge for each transition of positive probability The label on an edge is the probability of that transition. 2

3 A Markov chain has period p if the states are partitioned into at most p classes, 1, 2, p such that transitions from each class i go only to class i Period = 1 (chain is aperiodic) 3

4 A finite state Markov chain is ergodic if it is aperiodic (period 1) and each state can be reached by some path from each other state Ergodic chain has steady state probabilities q(s) = q(s)pr(s s ) q(s) =1 s s Ergodic chain 4

5 A Markov source is a finite state Markov chain with each transition labeled by a source symbol from alphabet X. For each state, each outgoing transition has a different label. Thus the next state and the symbol specify each other. 1; ; 0.9 0; 0.5 1; 0.5 0; 0.5 1; ; 0.1 1; 0.9 5

6 Coding for Markov sources Simplest approach: use separate prefix-free code for each prior state. If S n 1 =s, then encode X n with the prefix-free code for s. The codeword lengths l(x, s) are chosen for the pmf p(x s). 2 l(x,s) 1 for each s x Optimal code given prior state s satisfies: where H[X s] L min (s) < H[X s]+1 H[X s] = x X P (x s) log P (x s) 6

7 If the pmf on S 0 is the steady state pmf, {q(s)}, then the chain remains in steady state. where H[X S] L min < H[X S]+1, (1) L min = q(s)l min (s) and s S H[X S] = q(s)h[x s] s S The encoder transmits s 0 followed by codeword for x 1 using code for s 0. This specifies s 1 and x 2 is encoded with code for s 1, etc. This is prefix free and can be decoded instantaneously. 7

8 Conditional Entropy H[X S] for Markov is like H[X] for DMS. Note that H[X S] = 1 s S x X q(s)p (x s) log P (x s) 1 H[XS] = q(s)p (x s) log s,x q(s)p (x s) = H[S]+ H[X S] Recall that H[XS] H[S]+ H[X] H[X S] H[X] This is general for any random symbols. 8

9 Suppose we use 2-to-variable-length codes for each state. H[X 1 X 2 S 0 ] = q(s 0 )H[X 1 X 2 s 0 ] s 0 = q(s 0 )H[S 1 S 2 s 0 ] s 0 = q(s 0 )(H[S 1 s 0 ]+H[S 2 S 1,s 0 ]) s 0 H[S 2 S 1,s 0 ]= Q(s 1 s 0 )H[S 2 s 1 ] s 1 q(s 0 )H[S 2 S 1,s 0 ]= q(s 1 )H[S 2 s 1 ]=H[S 2 S 1 ] s 0 s 1 H[X 1 X 2 S 0 ]=H(S 1 S 0 ]+H(S 2 S 1 ]=2H(X S) 9

10 In the same way, H[X 1,X 2,...X n S 0 ]= nh[x S] By using n-to-variable length codes, H[X S] L min,n < H[X S]+1/n Thus, for Markov sources, H[X S] is asymptotically achievable. The AEP also holds for Markov sources. L H[X S] ε can not be achieved, either in expected length or fixed length, with low probability of failure. 10

11 THE LZ77 UNIVERSAL ALGORITHM A Univeral data compressor operates without source statistics. We describe a standard string matching algorithm for this due to Ziv and Lempel (LZ77). In principle, a universal algorithm attempts to model the source and to encode it simultaneously. With instantaneous decodability, the decoder knows the past also, and thus can track the encoder. 11

12 The objective (achieved by LZ77): Given the output from a given probability model (say a Markov source); L should be almost as small as for an algorithm designed for that model. Also, the algorithm should compress well in the absence of any ordinary kind of statistical structure. It should deal with gradually changing statistics. 12

13 Let x 1,x 2,... be the output of a source with known alphabet X of size M. Let x n m denote the string x m,x m+1,...,x n. The window is the w =2 k most recently encoded source symbols for some k to be selected. The Lempel-Ziv algorithm matches the longest string of yet unencoded symbols with strings starting in the window. w = window Match n =3 b c d a c b a b a c d b c a b a b d u =7 P 13

14 The compression algorithm: 1. Encode the first w symbols without compression, using M binary digits per symbol. (this gets amortized over the sequence so we don t care about it) 2. Set the pointer P = w. (As the algorithm runs, x P 1 is the already encoded string of source symbols and x P +1 is the first new symbol to be encoded.) 14

15 3. Find the largest n 2 such that x P +n P u+n P +1 = x P u+1 for some u, 1 u w. Set n = 1 otherwise. w = window Match n =3 b c d a c b a b a c d b c a b a b d u =7 P w = window P n =4 Match a b a a c b a b a c d a b a b a b d u =2 15

16 (4) Encode the match size n into a codeword from the so-called unary-binary code. The positive integer n is encoded into the binary representation of n, preceded by a prefix of log 2 n zeroes; i.e., n prefix base 2 exp. codeword

17 (5) If n >1, encode the positive integer u w using a fixed-length code of length log w bits. (At this point the decoder knows n, and can simply count back by u in the previously decoded string to find the appropriate n-tuple, even if there is overlap as above.) If n =1, encode the single letter without compression (6) Set the pointer P to P + n and go to step (2). (Iterate for ever.) 17

18 This is a variable-to-variable-length encoding. A segment of length n > 1 is encoded into l(n) + log(w) binary digits. 1 1 l(1) = l(2) = l(3) = l(4) = l(5) = 5 l(n) = 2 log n +1 This is universal since no knowledge of source statistics is used. Match typically occurs at H[X S]/ log w. 18

19 QUANTIZATION input discrete sampler quantizer waveform encoder analog sequence symbol sequence reliable binary channel output waveform analog filter table lookup discrete decoder 19

20 Converting real numbers to binary strings requires a mapping from R to a discrete alphabet. This is called scalar quantization. Converting real n-tuples to binary strings requires mapping R n to a discrete alphabet. This is called vector quantization. Scalar quantization encodes each term of the source sequence separately. Vector quantization segments source sequence into n-blocks which are quantized together. 20

21 A scalar quantizer partitions R into M regions R 1,...,R M. Each region R j is mapped to a symbol a j called the representation point for R j. b 1 b 2 b 3 b 4 b 5 R 1 R 2 R 3 R 4 R 5 R 6 a 1 a 2 a 3 a 4 a 5 a 6 Each source value u R j is mapped into the same representation point a j. After discrete coding and channel transmission, the receiver sees a j and the distortion is u a j. 21

22 View the source value u as a sample value of a random variable U. The representation a j is a sample value of the rv V where V is the quantization of U. That is, if U R j, then V = a j. The source sequence is U 1,U 2,... The representation is V 1,V 2,... where if U k R j, then V k = a j. Assume that U 1,U 2,... is a memoryless source which means that U 1,U 2,... is iid. For a scalar quantizer, we can look at just a single U and a single V. 22

23 We are almost always interested in the mean square distortion of a scalar quantizer Interesting problem: MSE = E[(U V ) 2 ] For given probability density f U (u) and given alphabet size M, choose {R j, 1 j M} and {a j, 1 j M} to minimize MSE. 23

24 Subproblem 1: Given representation points {a j }, choose the regions {R j } to minimize MSE. This is easy: for source output u, squared error to a j is u a j 2. Minimize by choosing closest a j. Thus R j is region closer to a j than any a j. R j is bounded by a j + a j 1 b j 1 = 2 a j + a j+1 b j = 2 MSE regions must be intervals. 24

25 Subproblem 2: Given interval regions {R j }, choose the representation points {a j } to minimize MSE. Given U R j, the conditional density of U is f j (u) = f U (u)/q j for u Rj where Q j =Pr(U R j ). Let U(j) be rv with density f j (u). E[ U(j) a j 2 ]= σ U 2 (j) + E[U(j)] a j 2 Choose a j = E[u(j)]. 25

26 An optimal scalar quantizer must satisfy both b j =(a j + a j+1 )/2 and a j = E[U(j)]. The Lloyd-Max algorithm: 1. choose a 1 <a 2 < <a m. 2. Set b j =(a j + a j+1 )/2 for 1 j M Set a = E[U(j)] where R j =(b j 1, b j ] for j 1 j M Iterate on 2 and 3 until improvement is negligible. The MSE is non-negative and non-increasing with iterations, so it reaches a limit. 26

27 MIT OpenCourseWare Principles of Digital Communication I Fall 2009 For information about citing these materials or our Terms of Use, visit:

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal