Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Similar documents
LECTURE 3. Last time:

(Classical) Information Theory II: Source coding

Lecture 22: Final Review

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

3F1 Information Theory, Lecture 3

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Lecture 6: Entropy Rate

Homework Set #2 Data Compression, Huffman code and AEP

Chapter 2: Source coding

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Entropy Rate of Stochastic Processes

Exercises with solutions (Set D)

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

COMM901 Source Coding and Compression. Quiz 1

Lecture 5: Asymptotic Equipartition Property

3F1 Information Theory, Lecture 3

Lecture 11: Quantum Information III - Source Coding

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Communications Theory and Engineering

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Lecture 4 Noisy Channel Coding

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Capacity of a channel Shannon s second theorem. Information Theory 1/33

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Lecture 4 Channel Coding

= 4. e t/a dt (2) = 4ae t/a. = 4a a = 1 4. (4) + a 2 e +j2πft 2

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Perron Frobenius Theory

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Shannon s Noisy-Channel Coding Theorem

Exercises with solutions (Set B)

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Discrete Memoryless Channels with Memoryless Output Sequences

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory: Entropy, Markov Chains, and Huffman Coding

Lossy Distributed Source Coding

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Lecture 10: Broadcast Channel and Superposition Coding

18.600: Lecture 32 Markov Chains

MATH37012 Week 10. Dr Jonathan Bagley. Semester

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Introduction to information theory and coding

ELEMENTS O F INFORMATION THEORY

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

6 Markov Chain Monte Carlo (MCMC)

MARKOV PROCESSES. Valerio Di Valerio

Lecture 3: Markov chains.

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 5 Channel Coding over Continuous Channels

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Dispersion of the Gilbert-Elliott Channel

Motivation for Arithmetic Coding

arxiv: v1 [cs.it] 5 Sep 2008

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Lecture 8: Shannon s Noise Models

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Chapter 5: Data Compression

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Stochastic Processes

Solutions to Set #2 Data Compression, Huffman code and AEP

Anatomy of a Bit. Reading for this lecture: CMR articles Yeung and Anatomy.

Lecture 3 : Algorithms for source coding. September 30, 2016

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

A One-to-One Code and Its Anti-Redundancy

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

18.440: Lecture 33 Markov Chains

Data Compression. Limit of Information Compression. October, Examples of codes 1

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Information Theory and Statistics Lecture 2: Source coding

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

Chapter 9 Fundamental Limits in Information Theory

MEASURE-THEORETIC ENTROPY

Introduction to Machine Learning CMU-10701

Transcription:

Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it was assumed that The source statistics is known The source symbols are i.i.d. Normally this is not the case. How much can the source be compressed? How can it be achieved? Stefan Höst Information theory

Random process Definition (Random process) A random process {X i } n i= is a sequence of random variables. There can be an arbitrary dependence among the variables and the process is characterized by the joint probability function P ( X, X 2,..., X n = x, x 2,..., x n ) = p(x, x 2,..., x n ), n =, 2,... Definition (Stationary random process) A random process is stationary if it is invariant in time, P ( ) ( ) X,..., X n = x,..., x n = P Xq+,..., X q+n = x,..., x n for all time shifts q. Stefan Höst Information theory 2

Entropy rate Definition The entropy rate of a random process is defined as H (X) = lim n n H(X X 2... X n ) Define the alternative entropy rate for a random process as Theorem H(X X ) = lim n H(X n X X 2... X n ) The entropy rate and the alternative enropy rate are equivalent, H (X) = H(X X ) Stefan Höst Information theory 3

Entropy rate Theorem For a stationary stochastic process the entropy rate is bounded by H (X) H(X) log k log k H(X) H(X n X... X n ) H (X) n Stefan Höst Information theory 4

Source coding for random processes Optimal coding of process Let X = (X,..., X N ) be a vector of N symbols from a random process. Use an optimal source code to encode the vector. Then H(X... X N ) L (N) H(X... X N ) + which gives the average codeword length per symbol, L = N L(N), N H(X... X N ) L N H(X... X N ) + N In the limit as N the optimal codeword length per symbol becomes lim N L = H (X) Stefan Höst Information theory 5

Markov chain Definition (Markov chain) A Markov chain, or Markov process, is a random process with unit memory, P ( x n x,..., x n ) = P ( xn x n ), for all x i Definition (Stationary) A Markov chain is stationary (time invariant) if the conditional probabilities are independent of the time, P ( X n = x a X n = x b ) = P ( Xn+l = x a X n+l = x b ) for all relevant n, l, x a and x b. Stefan Höst Information theory 6

Markov chain Theorem For a Markov chain the joint probability function is p(x, x 2,..., x n ) = = n i= n i= p(x i x, x 2,..., x i ) p(x i x i ) = p(x )p(x 2 x )p(x 3 x 2 ) p(x n x n ) Stefan Höst Information theory 7

Markov chain characterization Definition A Markov chain is characterized by A state transition matrix P = [ p(x j x i ) ] i,j {,2,...,k} = [p ij] i,j {,2,...,k} where p ij and j p ij =. A finite set of states X {x, x 2,..., x k } where the state determines everything about the past. The state transition graph describes the behaviour of the process Stefan Höst Information theory 8

Example /3 /2 x /4 2/3 The state transition matrix P = 3 4 2 2 3 3 4 2 /2 The state space is x 3 x 2 X {x, x 2, x 3 } 3/4 Stefan Höst Information theory 9

Markov chain Theorem Given a Markov chain with k states, let the distribution for the states at time n be Then π (n) = (π (n) π (n) 2... π (n) k ) π (n) = π () P n where π () is the initial distribution at time. Stefan Höst Information theory

Example, asymptotic distribution P 2 = P 4 = 2 3 3 3 4 4 2 2 2 33 2 6 36 39 24 27 2 3 3 3 4 4 2 2 2 33 2 = 6 36 39 24 27 = 2 33 2 684 584 947 584 779 584 6 36 39 24 27 88 584 249 584 92 584 692 584 88 584 485 584.3485.3.2794 P 8 = =.349.3.2788.3489.32.2789 Stefan Höst Information theory

Markov chain Theorem Let π = (π... π k ) be an asymptotic distribution of the state probabilities.then j π j = π is a stationary distribution, i.e. πp = π π is a unique stationary disribution for the source. Stefan Höst Information theory 2

Entropy rate of Markov chain Theorem For a stationary Markov chain with stationary distribution π and transition matrix P, the entropy rate can be derived as where the entropy of row i in P. H (X) = π i H(X 2 X = x i ) i H(X 2 X = x i ) = p ij log p ij j Stefan Höst Information theory 3

Example, Entropy rate Entropy per row: P = 2 3 3 3 4 4 2 2 H(X 2 X = x ) = h( 3 ) H(X 2 X = x 2 ) = h( 4 ) H(X 2 X = x 3 ) = h( 2 ) = Hence H (X) = 5 43 h( 3 ) + 5 43 h( 4 ) + 2 43 h( 2 ).93 bit/source symbol Stefan Höst Information theory 4

Data processing lemma X Process A Y Process B Z Lemma (Data Processing Lemma) If the random variables X, Y and Z form a Markov chain, X Y Z, we have I(X; Z ) I(X; Y ) I(X; Z ) I(Y ; Z ) Conclusion The amount of information can not increase by data processing, neither pre nor post. It can only be transformed (or destroyed). Stefan Höst Information theory 5