LECTURE 3. Last time:

Similar documents
LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 13. Last time: Lecture outline

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Markov Chain Model for ALOHA protocol

Lecture 22: Final Review

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

6 Markov Chain Monte Carlo (MCMC)

Computing and Communications 2. Information Theory -Entropy

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 6: Entropy Rate

x log x, which is strictly convex, and use Jensen s Inequality:

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 4 Noisy Channel Coding

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 8: Channel Capacity, Continuous Random Variables

(Classical) Information Theory II: Source coding

ECE 4400:693 - Information Theory

5 Mutual Information and Channel Capacity

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

STOCHASTIC PROCESSES Basic notions

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Coding of memoryless sources 1/35

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Entropy Rate of Stochastic Processes

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Lecture 10: Broadcast Channel and Superposition Coding

Stochastic process. X, a series of random variables indexed by t

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Exercises with solutions (Set D)

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

ELEC546 Review of Information Theory

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 14 February 28

On the Throughput, Capacity and Stability Regions of Random Multiple Access over Standard Multi-Packet Reception Channels

Discrete Random Variables

EE 4TM4: Digital Communications II. Channel Capacity

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Solutions to Homework Set #3 Channel and Source coding

6.842 Randomness and Computation March 3, Lecture 8

1 Random Walks and Electrical Networks

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

7. Shortest Path Problems and Deterministic Finite State Systems

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 5 Channel Coding over Continuous Channels

Markov Chains and MCMC

Lecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking

Lecture 11: Quantum Information III - Source Coding

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Mathematical Methods for Computer Science

Markov Chains, Random Walks on Graphs, and the Laplacian

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Homework Set #2 Data Compression, Huffman code and AEP

The Theory behind PageRank

MARKOV PROCESSES. Valerio Di Valerio

1 Basic Information Theory

Lecture 2: August 31

Markov Processes Hamid R. Rabiee

Information Theory: Entropy, Markov Chains, and Huffman Coding

1 Joint and marginal distributions

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Secret Key Agreement: General Capacity and Second-Order Asymptotics. Masahito Hayashi Himanshu Tyagi Shun Watanabe

6.842 Randomness and Computation February 24, Lecture 6

Disjointness and Additivity

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Markov Chains and Stochastic Sampling

Lecture 2. Capacity of the Gaussian channel

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Lecture 5: Random Walks and Markov Chain

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

Lecture 5 - Information theory

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

3F1 Information Theory, Lecture 1

Information Theory Primer:

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Exercises with solutions (Set B)

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Transcription:

LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate Markov chains Random walks on graphs Hidden Markov models Reading: Chapter 4.

Quick Review Mutual Information: I(X; Y ) = H(X) H(X Y ) = PX,Y (x, y) log x,y = D(P X,Y P X P Y ) P X,Y (x, y) P X (x)p Y (y) Chain Rule of Mutual Information. I(X 1, X 2 ; Y ) = I(X 1 ; Y ) + I(X 2 ; Y X 1 ) D(p q) 0. Entropy H(X) is concave in P X ; Mutual information I(X; Y ) is concave in P X for fixed P Y X, and convex in P Y X for fixed P X. X Y Z I(X; Y ) I(X; Z).

Fano s lemma Suppose we have r.v.s X and Y, Fano s lemma bounds the error we expect when estimating X from Y We generate an estimator of X that is X = g(y ). Probability of error P e = P r( X = X) Indicator function for error E which is 0 when X = X and 1 otherwise. Thus, P e = P (E = 1) Fano s lemma: H(E) + P e log( X 1) H(X Y )

Proof of Fano s lemma H(E, X Y ) = H(X Y ) + H(E X, Y ) = H(X Y ) H(E, X Y ) = H(E Y ) + H(X E, Y ) H(E) + H(X E, Y ) = H(E) +P e H(X E = 1, Y ) +(1 P e )H(X E = 0, Y ) = H(E) + P e H(X E = 1, Y ) H(E) + P e H(X E = 1) H(E) + P e log( X 1) Works well (tight) when X is large.

Stochastic processes A stochastic process is an indexed sequence or r.v.s X 0, X 1,..., a map from Ω to X. A stochastic process is characterized by the joint PMF: P X0,X 1,...,X n (x 0, x 1,..., x n ), (x 0, x 1,..., x n) X n, for n = 0, 1,... The entropy of a stochastic process H(X 1, X 2,...) = H(X 1 ) + H(X 2 X 1 ) +... +H(X i X 1,... X i 1 ) +... Difficulties Sum to infinity all terms are different in general.

Entropy Rate The entropy rate of a random process if it exists 1 lim H(X n ) n n Examples: i.i.d. sequence of r.v.s i.i.d. blocks of r.v.s A stochastic process is stationary if P X0,X 1,...,X n (x 0, x 1,..., x n ) = P Xl,X l+1,...,x l+n (x 0, x 1,..., x n ) for every shift l and all (x 0, x 1,..., x n ) X n. For stationary processes, the limit exists.

Entropy Rate of Stationary Processes Chain Rule: 1 1 H(X 1, X 2,..., X n ) = n n n H(Xi X 1,..., X i 1 ) i=1 The limit on LHS exists iff the individual terms on the RHS has a limit. For a stationary process H(X n+1 1 ) H(X n+1 2 ) = H(X n X1 n 1 ) X n X n Therefore the sequence H(X n X n 1 ) is non 1 increasing and non negative, so limit exists. Theorem For stationary processes, the entropy rate 1 lim H(X1 n ) = lim H(X n X n 1 ) n n n 1

Markov Chain A discrete stochastic process is a Markov chain if ( ) P Xn X 0,...,X xn n 1 x 0,..., x n 1 ( ) = P Xn X xn n 1 x n 1 for n = 1, 2,... and all (x 0, x 1,..., x n ) X n. X n : state after n transitions belongs to a finite set, e.g., {1,..., m} X 0 is either given or random

Time Invariant Markov Processes The transition probability is time invariant. p i,j = P(X n+1 = j X n = i) = P(X n+1 = j X n = i, X n 1,..., X 0 ) Markov chain is characterized by probability transition matrix P = [p i,j ] Question: Stationary vs. Time Invariant. Let r i (n) = P (X n = i), condition on an initial condition or average over random initial state, Key recursion r j (n + 1) = r i (n)p i,j i or r(n + 1) = r(n)p

Review of Markov chains Is there always a solution of π = πp, which is a probability vector? Is that solution unique? Starting from any initial state (or random), does the state distribution converge to π? A Markov chain with a single class of recurrent aperiodic states, there is a unique stationary distribution π. Each row of P n converges to π. The Entropy Rate of Markov Chain 1 lim H(X1 n n n ) = lim H(X n X n 1 ) n = π i p i,j log p i,j i,j

Random walk on graph Example: Random walk on a 3 3 chessboard 1 2 3 4 5 6 7 8 9 [ 1 1 1 1 1 ] p 2,j =, 0,,,,, 0, 0, 0 5 5 5 5 5 Condition on X n 1 = 2, observing X n gives log 2 5(bit) information. Entropy rate 4π 1 log 3 + 4π 2 log 5 + π 5 log 8 For n n chessboard with n, entropy rate approaches log 8.

Random walk on graph Consider undirected graph G = (N, E, W ) where N, E, W are the nodes, edges and weights. With each edge there is an associated edge W i,j W i,j = W j,i, W i = Wi,j W = j W i,j i,j:j>i 2W = Wi i We call a random walk the Markov chain in which the states are the nodes of the graph W i,j p i,j = Wi W i π i = 2W

Random Walk on Graph Check: i π i = 1 and i π i p i,j = W i 2W W i,j i W i W i,j = i 2W W j = 2W = π j Back to the Example: Random walk on 3 3 chessboard, W i,j = 1 for all connected i, j, 2W = 40. 3 π 1 = 40 5 π 2 =, 40 8 π 5 = 40

Random walk on graph H(X 2 X 1 ) = πi pi,j log(p i,j ) i j = ( ) W i W i,j Wi,j log i 2W j W i W i = ( ) W i,j Wi,j log i,j 2W W i = ( ) W i,j Wi,j log i,j 2W 2W ( ) W i,j Wi + log i,j 2W 2W = ( ) W i,j Wi,j log + ( ) W i Wi log 2W 2W 2W 2W i,j Entropy rate is difference of two entropies i

Hidden Markov models Consider an ALOHA wireless model M users sharing the same radio channel to transmit packets to a base station During each time slot, a packet a arrives to a user s queue with probability p, independently of the other M 1 users Also, at the beginning of each time slot, if a user has at least one packet in its queue, it will transmit a packet with probability q, independently of all other users If two packets collide at the receiver, they are not successfully transmitted and remain in their respective queues

Hidden Markov models Let X i = (n 1, n 2,..., n M ) denote the random vector at time i where n m is the number of packets that are in user m s queue. X i is a Markov chain. Consider the random vector Y i = (y 1, y 2,..., y M ) where y i = 1 if user i transmits during time slot i and y i = 0 otherwise Is Y i Markov?

Hidden Markov processes Let..., X 1, X 2,... be a stationary Markov chain and let Y i = φ(x i ) be a process, each term of which is a function of the corresponding state in the Markov chain..., Y 1, Y 2,... form a hidden Markov chain, which is not always a Markov chain, but is still stationary What is its entropy rate? We can compute H(Y n Y n 1 ), which mono 1 tonically decreases with n. Need a lower bound to the entropy rate.

Genie Trick Want to construct another sequence b n, which is lower bound of the entropy rate lim n H(Y n Y1 n 1 ), yet has the same limit. Lower bound of entropy: give a genie. Genie information has to be small, but enough to flip the scale. Choose to look at H(Y n Y n 1, X 1 ). 1 Claim H(Y n Y n 1, X 1 ) lim H(Y n Y n 1 ) 1 n 1 n 1 H(Y n Y 1, n 1 X1 ) = H(Y n Y 1, X1, X 0,..., X k ) = H(Y n Y1 n 1, X 1 k ) H(Y n Y n 1 k, Y 0 ) k All the information about the history is captured in X 1.

Hidden Markov processes Claim H(Y n Y n 1 ) 1 H(Yn Y 1 n 1, X 1 ) = I(X 1 ; Y n Y1 n 1 ) 0 Indeed, lim I (X 1 ; Y n 1 ) = n lim I ( X1 ; Y i Y i 1) n n 1 i=1 I ( X1 ; Y i Y i 1) = 1 i=1 since we have an infinite sum in which the terms are non negative and which is upper bounded by H(X 1 ), the terms must tend to 0