Capacity of a channel Shannon s second theorem. Information Theory 1/33

Similar documents
Lecture 15: Conditional and Joint Typicaility

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Shannon s noisy-channel theorem

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 3: Channel Capacity

Lecture 4 Noisy Channel Coding

LECTURE 10. Last time: Lecture outline

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Exercise 1. = P(y a 1)P(a 1 )

(Classical) Information Theory III: Noisy channel coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Shannon s Noisy-Channel Coding Theorem

X 1 : X Table 1: Y = X X 2

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

The Method of Types and Its Application to Information Hiding

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 14 February 28

LECTURE 13. Last time: Lecture outline

Lecture 8: Shannon s Noise Models

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 22: Final Review

Entropies & Information Theory

EE 4TM4: Digital Communications II. Channel Capacity

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

ELEC546 Review of Information Theory

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 2. Capacity of the Gaussian channel

Principles of Communications

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Noisy channel communication

Homework Set #2 Data Compression, Huffman code and AEP

Shannon s Noisy-Channel Coding Theorem

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

CSCI 2570 Introduction to Nanocomputing

Lecture 4 Channel Coding

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Interactive Communication for Data Exchange

The Capacity Region for Multi-source Multi-sink Network Coding

Communications Theory and Engineering

ECE Information theory Final

Distributed Source Coding Using LDPC Codes

ELEMENT OF INFORMATION THEORY

Revision of Lecture 5

EE5585 Data Compression May 2, Lecture 27

3F1 Information Theory, Lecture 3

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Coding of memoryless sources 1/35

ECEN 655: Advanced Channel Coding

Lecture 11: Quantum Information III - Source Coding

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

ECE 4400:693 - Information Theory

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels

Lecture 11: Polar codes construction


18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 10: Broadcast Channel and Superposition Coding

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

Capacity of AWGN channels

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Shannon s A Mathematical Theory of Communication

Upper Bounds on the Capacity of Binary Intermittent Communication

Distributed Lossless Compression. Distributed lossless compression system

Quantum Information Theory and Cryptography

Quiz 2 Date: Monday, November 21, 2016

Solutions to Homework Set #3 Channel and Source coding

ECE Information theory Final (Fall 2008)

Reliable Computation over Multiple-Access Channels

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

A Practical and Optimal Symmetric Slepian-Wolf Compression Strategy Using Syndrome Formers and Inverse Syndrome Formers

Lecture 2: August 31

Lecture 5 Channel Coding over Continuous Channels

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

ECE 534 Information Theory - Midterm 2

Lecture 3 : Algorithms for source coding. September 30, 2016

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Lecture 5: Asymptotic Equipartition Property

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 1: Introduction, Entropy and ML estimation

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

An introduction to basic information theory. Hampus Wessman

The PPM Poisson Channel: Finite-Length Bounds and Code Design

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

3F1 Information Theory, Lecture 1

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Quantum Information Theory

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Appendix B Information theory from first principles

transmission and coding of information problem list José Luis Ruiz july 2018

Transcription:

Capacity of a channel Shannon s second theorem Information Theory 1/33

Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem, proof. Information Theory 2/33

1. Memoryless channels Definition A discrete channel is given by an input alphabet X = {a 1,..., a K } an output alphabet Y = {b 1,..., b J } transition probabilities P Y X, i.e. a stochastic matrix Π = P(b 1 a 1 )... P(b J a 1 )..... P(b 1 a K )... P(b J a K ) The channel is memoryless if for all transmitted (x 1,..., x n ) and all received (y 1,..., y n ), we have P(y 1,..., y n x 1,..., x n ) = P(y 1 x 1 )... P(y n x n ). Information Theory 3/33

Example Binary symmetric channel 1 p 0 0 p p 1 1 1 p The stochastic matrix is ( 1 p p p 1 p ). p is called the crossover probability of the channel. Information Theory 4/33

Erasure channel 1 p 0 0 p ε 1 p 1 p 1 Π = ( 1 p p 0 0 p 1 p ). Information Theory 5/33

2. Capacity The capacity of a channel is defined by the maximum mutual information between a random variable X taking its values on the input alphabet and the corresponding output Y of the channel C X def = sup I(X; Y ) with X channel Y Information Theory 6/33

Capacity It is useful to note that I(X; Y ) can be written as follows (using only the input distribution and the transition probabilities) I(X; Y ) = x,y P(y x)p(x) log 2 P(y x) P(y) P(y) = x P(y x)p(x). Information Theory 7/33

Capacity of a binary erasure channel 1 p 0 0 p ε 1 p 1 p 1 C = max p(x) I(X; Y ) = max(h(y ) H(Y X)) p(x) Observe that H(Y X) = P(X = 0)h(p) + P(X = 1)h(p) = h(p) with h(p) def = p log 2 p (1 p) log 2 (1 p). Information Theory 8/33

Capacity of the binary erasure channel (II) Letting a def = P(X = 1), we obtain : P(Y = 1) = a(1 p) P(Y = 0) = (1 a)(1 p) P(Y = ɛ) = ap + (1 a)p = p H(Y ) = a(1 p) log a(1 p) (1 a)(1 p) log(1 a)(1 p) p log p = (1 p)h(a) + h(p) Therefore C = max(1 p)h(a) = 1 p a Information Theory 9/33

Z-channel 1 0 0 p 1 1 1 p The stochastic matrix is ( 1 0 p 1 p ). For a distribution x def = P(X = 1), we have I(X; Y ) = h(x(1 p)) xh(p) Maximum attained in x = ( ( )) 1 (1 p) 1 + 2 h(p) 1 p Information Theory 10/33

Symmetric channels Definition A discrete memoryless channel is symmetric if each row/column is a permutation of the first row/column. Proposition 1. X. In a symmetric channel H(Y X) does not depend on H(Y X) = y P (y x) log 2 P (y x). Information Theory 11/33

Proof H(Y X) = x P (x) y P (y x) log 2 P (y x) = x P (x)h(π) = H(Π) where H(Π) = y P (y x) log 2 P (y x) is independent of x. Information Theory 12/33

Capacity of a symmetric channel Hence C = sup X I(X; Y ) = sup(h(y ) H(Y X)) = sup(h(y )) H(Π) log 2 Y H(Π). The entropy is maximised when Y is uniform. Note that Y is uniform when X is uniform for a symmetric channel. Information Theory 13/33

Capacity of a symmetric channel (II) Proposition 2. The capacity of a symmetric channel is attained for a uniform distribution on the inputs and is equal to C = log 2 Y H(Π) Information Theory 14/33

Example Capacity of a binary symmetric channel C = 1 h(p) To compar with the capacity of the binary erasure channel : C = 1 p Information Theory 15/33

3. Channel coding Let us consider a discrete memoryless channel T = (X, Y, Π) Definition A block code of length n and of cardinality M is a set of M sequences of n symbols of X. It is an (M, n)-code. Its elements are called codewords. The code rate is equal to R = log 2 M n Such a code allows to encode log 2 M bits per codeword transmission. R is also equal to the number of transmitted bits per channel use. An encoder is a procedure that maps a finite binary sequence to a finite sequence of elements of X. Information Theory 16/33

Code performance Decoding Let C be an (M, n)-block code used over a discrete memoryless channel (X, Y, Π) Definition A decoding algorithm for C is a procedure which maps any block of n symbols of Y to a codeword in C. The event  bad decoding  for a decoding algorithm is defined by : A codeword x C X n is transmitted through the channel, the word y Y n is received and is decoded in x x. Definition The error rate of C (for a given channel and sent codeword x) denoted by P e (C, x) is the probability of bad decoding when x is transmitted. Information Theory 17/33

Examples for the binary symmetric channel Repetition code of length 3 C = {000, 111} Single Parity-check code of length 4 C = {0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111} Information Theory 18/33

Hamming code of length 7 C = {0000000, 1101000, 0110100, 0011010, 0001101, 1000110, 0100011, 1010001, 1111111, 0010111, 1001011, 1100101, 1110010, 0111001, 1011100, 0101110} Information Theory 19/33

Decoding of the Hamming code The Hamming distance d(x, y) is equal to d(x, y) = {i; x i y i } It can be verified that all codewords of the Hamming code are at distance at least 3 from each other. This implies that the balls of radius 1 centered around each codeword do not intersect. Moreover any binary word of length 7 is at distance at most 1 from a Hamming codeword, since 16(1 + 7) = 2 4 2 3 = 2 7. Decoding algorithm : when y is received, return the codeword x at distance 1 from y. Information Theory 20/33

Shannon s second theorem Theorem 1. Consider a discrete memoryless channel of capacity C. For all R < C, there exists a sequence of block codes (C n (M, n)) n>0 of rate R n together with a decoding algorithm such that lim R n = R and lim sup P e (C n, x) = 0 n n x C n Theorem 2. Consider a discrete memoryless channel of capacity C. Any code C of rate R > C satisfies 1 M x C P e(c, x) > K(C, R), where K(C, R) > 0 depends on the channel and the rate but is independent of the code. Information Theory 21/33

Error exponent There is even a stronger version of Shannon s theorem : there are block codes of rate R and length n for which sup x P e (C, x) e ne(r) where E(R) is called the error exponent. It depends on the channel and the transmission rate and satisfies E(R) > 0 if R < C Information Theory 22/33

Jointly typical sequences Definition [Jointly typical set] Let (X (n), Y (n) ) be a pair of r.v. taking its values in a discrete set A n B n, p(x, y) be the joint probability distribution of (X (n), Y (n) ), and let p(x) and p(y) be the probability distribution of X (n) and Y (n) respectively. The set of jointly typical sequences T ɛ (n) is given by T (n) ɛ = {(x, y) A n B n : 1 n 1 n 1 n ( log 2 p(x) H(X (n) )) < ɛ (1) ( log 2 p(y) H(Y (n) )) < ɛ (2) ( log 2 p(x, y) H(X (n), Y )) } (n) < ɛ Information Theory 23/33 (3)

Theorem 3. Let (X i, Y i ) be a sequence of pairs of i.i.d. r.v. taking their values in A B distributed as a fixed pair (X, Y ). We define T ɛ (n) from (X (n), Y (n) ) with X (n)def = (X 1, X 2,..., X n ) and Y (n)def = (Y 1, Y 2,..., Y n ). Then 1. Prob(T ɛ (n) ) > 1 ɛ for n sufficiently large. 2. T ɛ (n) 2 n(h(x,y )+ɛ). 3. Let X(n) and Ỹ (n) be 2 independent r.v. with X (n) X (n) and Ỹ (n) Y (n). Then, Prob {( X(n), Ỹ (n)) } T ɛ (n) Moreover, for n sufficiently large Prob {( X(n), Ỹ (n)) } T ɛ (n) 2 n(i(x;y ) 3ɛ). (1 ɛ)2 n(i(x;y )+3ɛ). Information Theory 24/33

Proof of Point 3. p {( X(n), Ỹ (n)) } T ɛ (n) = (x,y) T (n) ɛ p(x)p(y) T ɛ (n) 2 n(h(x) ɛ) n(h(y ) ɛ) 2 2 (nh(x,y )+ɛ) 2 n(h(x) ɛ) n(h(y ) ɛ) 2 = 2 n(i(x;y ) 3ɛ). Information Theory 25/33

Proof of Point 3. (II) p {( X(n), Ỹ (n)) } T ɛ (n) = (x,y) T (n) ɛ p(x)p(y) T ɛ (n) 2 n(h(x)+ɛ) n(h(y )+ɛ) 2 (1 ɛ)2 n(h(x,y ) ɛ) 2 n(h(x)+ɛ) n(h(y )+ɛ) 2 = (1 ɛ)2 n(i(x;y )+3ɛ). Information Theory 26/33

The direct part of Shannon s theorem The crucial point : random choice of the code! We begin by choosing a probability distribution P on the input symbols of the channel. Then we choose a code of length n and rate R by drawing 2 nr words in A n randomly according to the distribution P (n) on A n given by P (n) (x 1, x 2,..., x n ) = Π n i=1p(x i ). Information Theory 27/33

Typical set decoding x transmitted word, y received word. Let x 1, x 2,..., x 2nR codewords. be the 2 nr 1. compute the 2 nr probabilities P(received word = y, transmitted word = x s ) for s {1,..., 2 nr }. 2. if more than one pair or no pair (x i, y) is ɛ-jointly typical decoding failure. 3. otherwise output x s such that (x s, y) is jointly typical. Information Theory 28/33

Analysis of the decoder This decoder can fail for two reasons - the right pair (x, y) is not jointly typical (event E 0 ), - At least one of the 2 nr 1 pairs (x s, y) is jointly typical with x s x, (event E 1 ). Prob(decoding failure) = Prob(E 0 E 1 ) Prob(E 0 ) + Prob(E 1 ) Information Theory 29/33

The probability of the first event Prob(E 0 ) = Prob = 1 Prob(T (n) ɛ ) ( ) (X (n), Y (n) ) is not typical By using Point 1. of Theorem 3, we obtain 1 Prob(T (n) ɛ ) ɛ. Information Theory 30/33

The probability of the second event Prob(E 1 ) = Prob ( s:xs x {(x s, y) is typical}) Prob ((x s, y) is typical) s:x s x ( X (n), Ỹ (n) ) where X (n) X (n) Ỹ (n) Y (n) and ( X (n), Ỹ (n) ) independent Prob ((x s, y) is typical) = Prob ( ( X ) (n), Ỹ (n) ) is typical Information Theory 31/33

The probability of the second event (II) Therefore {( Prob X(n), Ỹ (n)) } is typical 2 n(i(x;y ) 3ɛ). Prob(E 1 ) (2 nr n(i(x;y ) 3ɛ) 1)2 2 n(i(x;y ) R 3ɛ). Information Theory 32/33

End of proof Prob(decoding failure) ɛ + 2 n(i(x;y ) R 3ɛ). End : choosing X s.t. C = I(X; Y ). Information Theory 33/33