(Classical) Information Theory III: Noisy channel coding

Similar documents
Lecture 3: Channel Capacity

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

(Classical) Information Theory II: Source coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Shannon s noisy-channel theorem

Shannon s Noisy-Channel Coding Theorem

LECTURE 10. Last time: Lecture outline

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 4 Noisy Channel Coding

Lecture 15: Conditional and Joint Typicaility

LECTURE 13. Last time: Lecture outline

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Noisy-Channel Coding

Entropies & Information Theory

EE 4TM4: Digital Communications II. Channel Capacity

Shannon s Noisy-Channel Coding Theorem

Noisy channel communication

Chapter 9 Fundamental Limits in Information Theory

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Upper Bounds on the Capacity of Binary Intermittent Communication

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

CSCI 2570 Introduction to Nanocomputing

Solutions to Homework Set #3 Channel and Source coding

Lecture 2: August 31

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Chapter I: Fundamental Information Theory

Information Theory - Entropy. Figure 3

Reliable Computation over Multiple-Access Channels

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Towards a Theory of Information Flow in the Finitary Process Soup

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 8: Shannon s Noise Models

Lecture 11: Continuous-valued signals and differential entropy

X 1 : X Table 1: Y = X X 2

Discrete Memoryless Channels with Memoryless Output Sequences

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Entropy as a measure of surprise

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018


Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 22: Final Review

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 1: Introduction, Entropy and ML estimation

ECE Information theory Final (Fall 2008)

Revision of Lecture 5

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

1 Background on Information Theory

ELEC546 Review of Information Theory

An introduction to basic information theory. Hampus Wessman

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Block 2: Introduction to Information Theory

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Chapter 9. Gaussian Channel

Quantum Information Theory and Cryptography

Lecture Notes for Communication Theory

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

An Introduction to Information Theory: Notes

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Lecture 5 Channel Coding over Continuous Channels

UNIT I INFORMATION THEORY. I k log 2

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Quantum Information Chapter 10. Quantum Shannon Theory

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

The Poisson Channel with Side Information

Lecture 2. Capacity of the Gaussian channel

Lecture 11: Quantum Information III - Source Coding

On the Duality between Multiple-Access Codes and Computation Codes

Lecture 4 Channel Coding

(Preprint of paper to appear in Proc Intl. Symp. on Info. Th. and its Applications, Waikiki, Hawaii, Nov , 1990.)

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

ECE 534 Information Theory - Midterm 2

Nash Equilibrium and information transmission coding and decoding rules

6.02 Fall 2011 Lecture #9

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Shannon s A Mathematical Theory of Communication

ELEMENTS O F INFORMATION THEORY

On Source-Channel Communication in Networks

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Computing and Communications 2. Information Theory -Entropy

3F1 Information Theory, Lecture 3

Information in Biology

Universal Anytime Codes: An approach to uncertain channels in control

An Extended Fano s Inequality for the Finite Blocklength Coding

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

to mere bit flips) may affect the transmission.

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

3F1 Information Theory, Lecture 1

A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels

Transcription:

(Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1

Abstract What is the best possible way to send any message through a noisy channel so that output of the channel is very much closed to input message? Shannon quantified it as the capacity of the channel higher the capacity, higher will be the closedness of the output message to the input one, on an average. For this purpose, one will have to use the channel many times. Capacity of a channel is given by the maximum of the mutual information between the input and the output of the channel, over all possible input probability distributions. By using random block coding of sufficiently large length, one can always send messages reliably through the noisy channel at any rate which is less than or equal to the capacity of the channel. In the present lecture, we will discuss about the capacity of a noisy channel and how to achieve that. p. 2

Outline What is a noisy channel? How does a channel work? Examples of noisy channels Essential idea to find out channel capacity Jointly typical sequences Noisy channel coding theorem and its converse Examples Summary References p. 3

What is a noisy channel? What is a channel: A channel is a device which gives an output if some input is fed into it. A telephone line connecting two (or more) telephone sets, broadcasting of programmes of a radio centre to some radio set, internet networking, nervous system, etc. are examples. Noisy channel: A noisy channel is such a channel which, in general, distorts the input messages. While talking over a mobile phone inside a running electric train, we generally can t hear properly the voice of the other person, we are talking to. This happens due to the interference of the mobile phone network with the electromagnetic field caused by the overhead electric wire of the train. So, here the em field causes the noise in the mobile networking system. p. 4

How does a channel work? (1) Express the message, to be sent through the channel, in terms of a string of letters chosen from an apriori fixed alphabet. (e.g., the sound wave of your speech over a telephone set gets transformed into a stream of several elementary electromagnetic signals) (2) This string of letters is then fed into the channel (The stream of em signals pass through the telephone cable). (3) Another string of letters is produced at the output of the channel (A stream of em signals is reproduced at the receiving end of the telephone line). (4) The output string of letters is then converted into a new message, which may or may not be same as the original message (The output stream of em signals are then converted into sound waves, which, in turn, may or may not be same as the input). p. 5

Examples of noisy channels Example (1) Noiseless binary channel: Under the action of the channel, 0 0 and 1 1. So each bit can be sent through the channel in an error-free manner. Example (2) Channel with non-overlapping outputs: Under the action of the channel, 0 1 with probability 1/2, 0 2 with probability 1/2, and 1 3 with probability 1/2, 1 4 with probability 1/2. Although the correspondence between the inputs and the outputs are not in one-to-one fashion, looking at the output, one can uniquely predict the input. So each of the bits can be sent through the channel in a noise-free manner. p. 6

Examples of noisy... Example (3) Noisy English typing machine: Under the action of the channel, the letters are being transformed as: with probability 1/2, a a while with probability 1/2, a b; with probability 1/2, b b while with probability 1/2, b c;...; with probability 1/2, y y while with probability 1/2, y z; with probability 1/2, z z while with probability 1/2, z a. But the punctuation symbols as well as the numbers remain intact. If we consider any paragraph involving only the letters (together with the punctuation symbols and numbers) a,c,...,y, then the paragraph can be typed intact by the machine. So, in this case the channel will work in a noise-free manner on the letters a,c,...,y. p. 7

Examples of noisy... Example (4) Binary symmetric channel: Given any proper fraction p, under the action of the channel, 0 0 with probability (1 p), 0 1 with probability p, and symmetrically, 1 1 with probability (1 p), 1 0 with probability p. Looking at the output, it is impossible to predict with certainty regarding the input. So, the bits can not be sent through this channel in an error-free manner. Example (5) Binary erasure channel: Given any proper fraction p, under the action of the channel, 0 0 with probability (1 p), 0 gets vanished with probability p, and 1 1 with probability (1 p), 1 gets vanished with probability p. Looking at the vanished output, it is impossible to predict with certainty regarding the input. p. 8

Examples of noisy... Example (6) Symmetric channel: Given any discrete probability distribution {p 1,p 2,...,p d } (thus p 1 + p 2 +... + p d = 1 and p i s are probabilities), under the action of the channel, 1 1 with probability p 1, 1 2 with probability p 2,..., 1 d with probability p d ; 2 1 with probability p (2) 1, 2 2 with probability p (2) probability p (2) d 2,..., 2 d with ;...; d 1 with probability p(d) 1, d 2 with probability p (d) 2,..., d d with probability p (d) d, where 1,p (d) 2,...,p (d) (p (2) 1,p (2) 2,...,p (2) d ),...,(p(d) the row (p 1,p 2,...,p d ) while d ) are permutations of (p 2,p (2) 2,...,p (d) 2 ),...,(p d,p (2) d,...,p(d) d ) are permutations of the column (p 1,p (2) 1,...,p (d) 1 ). Example (4) is a special case of it. p. 9

Essential idea to find out channel capacity A channel is uniquely described by the entire set of transition probabilities Prob(Y = y X = x) p y x for all possible values x of the input variable X and all possible values y of the output variable Y. p. 10

Essential idea to find... Probing the output values y, we would like to know about the corresponding input values x. So, more we can reduce the content of information in X by knowing Y, more efficiently we can know X. Thus, we should look at the mutual information I(X; Y ). As, given any channel, we can alter only the input probability distributions, therefore, in order to get the maximum possible efficiency of the channel, we should maximize I(X;Y ) over all possible input distributions p x s. This is maximized I(X;Y ) is the capacity of the channel in question. p. 11

Essential idea to find... Choosing the appropriate input prob. distribution p x is somewhat similar to choosing the selective no. of letters a,c,...,y in noisy English typing machine, so that looking at the perfectly distinguishable outputs, one can reliably infer about the inputs. p. 12

Essential idea to find... The basic idea is to first encode each message i from the message set {1, 2,...,M} using some encoding procedure X n (i) to produce (in an 1-to-1 way) a string of n bits. Then send this n-bit string through the channel to produce another n-bit string Y n. If X n (i) is a typical sequence, there are approximately 2 nh(y X) no. of possible sequences Y n (forming the set S typical, say), each being of equal probability. As we want that no two X sequences produce the same Y sequence, and as, in total, there are 2 nh(y ) no. of typical Y sequences, the total no. of required disjoint sets S typical must be 2 n(h(y ) H(Y X)) = 2 ni(x;y ). So, we can send at most 2 ni(x;y ) no. of distinguishable X sequences of length n, for sufficiently large n. p. 13

Jointly typical sequences Discrete channel: Represented by (X,p y x, Y), with finite input as well as output sets X, Y and transition probabilities p y x such that y p y x = 1. n-th power discrete memoryless channel: Represented by (X n,p y1,y 2,...,y n x 1,x 2,...,x n, Y n ) where p yk x 1,x 2,...,x k ;y 1,y 2,...,y k 1 = p yk x k (for k = 1, 2,...,n), with x j, y j being the input, output of the j-th use of the channel. n-th power discrete channel without feedback: Represented by (X n,p y1,y 2,...,y n x 1,x 2,...,x n, Y n ) where p xk x 1,x 2,...,x k 1 ;y 1,y 2,...,y k 1 = p xk x 1,x 2,...,x k 1 (for k = 1, 2,...,n). So, for an n-th power DMC without feedback, we have: p y1,y 2,...,y n x 1,x 2,...,x n = n i=1 p y i x i. p. 14

Jointly typical... An (M,n) code for a discrete channel (X,p y x, Y) is represented by ({1, 2,...,M}, X n : {1, 2,...,M} X n, g : Y n {1, 2,...,M}), X n being the encoding function with X n (1),X n (2),...,X n (M) being the codewords forming the codebook {X n (1),X n (2),...,X n (M)} and g being the decoding function. Conditional probability of error for an (M,n) code for a discrete channel (X,p y x, Y): λ i Prob(g(Y n ) i X n = X n (i)) = {(y 1,y 2,...,y n ) Y n :g((y 1,y 2,...,y n )) i} p y 1,y 2,...,y n x 1 (i),x 2 (i),...,x n (i) for i = 1, 2,...,M. Maximal probability of error for an (M,n) code for a discrete channel (X,p y x, Y): λ (n) max {λ i : i = 1, 2,...,M}. p. 15

Jointly typical... Average prob. of error for an (M,n) code: P error (n) (1/M) i=1 λ i. Note that if all the indices i s are chosen uniformly from {1, 2,...,M}, then P error (n) = M i=1 Prob(i g(y n )). Obviously, P error (n) λ (n). It can be shown that: smallness in P error implies that in λ (n). Rate of an (M,n) code: R (log 2 M)/n bits per use of the channel. Achievable rate R: A given rate R is achievable if there exists a sequence of ( 2 1R, 1), ( 2 2R, 2),...,( 2 nr,n),... codes for a discrete channel (X,p y x, Y) such that λ (n) 0 as n, where, for any real no. z, z is the smallest integer z. p. 16

Jointly typical... (Operational) capacity C of a DMC without feedback: It is the supremum of all the achievable rates. Thus any rate R < C will yield arbitrarily small probability of error for sufficiently large block lenght n. Intuitively, we decode the channel output Y n as the index i if the codeword X n (i) is jointly typical with Y n. If x n (x 1,x 2,...,x n ) is taken from the typical set A (X) (n,ǫ) and the corresponding output sequence y n (y 1,y 2,...,y n ) is also from the typical set A (Y ) (n,ǫ) for the joint prob. distribution p x n,y n, (xn,y n ) need not be jointly typical! p. 17

Jointly typical... Jointly typical set: A (X,Y ) (n,ǫ) = {(x n,y n ) X n Y n : (1/n)log 2 p x n H(X) < ǫ, (1/n)log 2 p y n H(Y ) < ǫ, (1/n)log 2 p x n,yn H(X,Y ) < ǫ}, with p x n,y n = n i=1 p x i,y i. p. 18

Jointly typical... Joint AEP: Let (X n,y n ) be drawn i.i.d. according to p x n,y n = n i=1 p x i,y i. Then (1) Prob(A (X,Y ) (n,ǫ)) 1 as n. (2) A (X,Y ) (n,ǫ) 2 n(h(x,y )+ǫ). (3) If ( X n,ỹ n ) has prob. distribution p x n p y n, then Prob(( X n,ỹ n ) A (X,Y ) (n,ǫ)) 2 n(i(x;y ) 3ǫ). (4) For sufficiently large n, Prob(( X n,ỹ n ) A (X,Y ) (n,ǫ)) (1 ǫ)2 n(i(x;y )+3ǫ). Using this AEP properties, a proof of Shannon s noisy channel coding theorem can be given. Shannon has shown in this proof that the operational capacity of a noisy channel is asymptotically same as the informational capacity of the channel. p. 19

Noisy channel coding theorem and its converse Given any rate R less than the capacity C, there exists a sequence ( 2 nr,n) of codes for a DMC without feedback with λ (n) 0. Conversely, any sequence ( 2 nr,n) of codes for a DMC without feedback with λ (n) 0, we must have R C. p. 20

Examples (1) For the noiseless binary channel (example 1), each bit is transmitted undisturbed through the channel, and so, looking each time at the channel output, one will be able to infer about the channel input, without making any error. So, one can send exactly one bit undisturbed per single use of the channel. Therefore the capacity C of this channel must be 1 bit. This is easy to show by maximizing I(X; Y ) over all possible initial probability distributions p x s. Choose, for example, p 0 = 1/2,p 1 = 1/2. p. 21

Example... (2) For the noisy channel with non-overlapping outputs (example 2), each transmitted bit can be recovered, without making any error, by looking at each of the four outputs. So, one can send exactly one bit undisturbed per single use of the channel. Therefore the capacity C of this channel must be 1 bit. This is easy to show by maximizing I(X; Y ) over all possible initial probability distributions p x s. Choose, for example, p 0 = 1/2,p 1 = 1/2. p. 22

Example... (3) For the noisy English type writter (example 3), any set of all the alternate alphabets (there are 13 such alphabets are there) gets transmitted without making any error per single use of the channel. So, one can send exactly log 2 13 bits of information intact per single use of the channel. Therefore the capacity C of this channel must be log 2 13 bits. This is easy to show by maximizing I(X; Y ) over all possible initial probability distributions p x s. Choose, for example, the uniform probability distribution: p a = p b =... = p z = 1/(26). p. 23

Example... (4) For the binary symmetric channel (example 4), I(X;Y ) = H(Y ) H(Y X) = H(Y ) x=0,1 p xh(y X = x) = H(Y ) x=0,1 p x H 2 (p) (as, for example, Prob(Y = 0 X = 0) = 1 p and Prob(Y = 1 X = 0) = p, therefore H(Y X = 0) = H 2 (p)) = H(Y ) H 2 (p) 1 H 2 (p) (as Y is here a binary random variable, so H(Y ) log 2 2 = 1), the inequality will become equality if p y is uniform, which will hold good if and only if p x is uniform, i.e., p 0 = p 1 = 1/2. So C = 1 H 2 (p). p. 24

Summary What do we mean by classical information of a data: it is given by the Shannon s entropy of the probability distribution of different events of that data. To what extent can one compress a data: for large string size of different events from the data, it is again given by the Shannon s entropy of the probability distribution of different events of that data, per unit length of the string. What is the maximum rate of transmission of data through a noisy channel: for sending any string of events of large length through the channel, the maximum rate can not cross the mutual information of the input and the output random variables per single use of the channel. p. 25

References In this set of three lectures, I have mostly followed the book Elements of Information Theory by Thomas M. Cover and Joy A. Thomas (Wiley-India, 2006). This is an easily-readable book. There are few other good books on classical information theory: Information Theory by R. B. Ash (Interscience, New York, 1965). Information Theory and Reliable Communication by R. G. Gallager (John Wiley and Sons, Inc., New York, 1968). Science and Information Theory by L. Brillouin (Academic Press, New York, 1962). p. 26

References... One may also look at the freely downloadable book Information Theory, Inference, and Learning Algorithms by David J. C. Mackay (Cambridge Univ. Press, 2003) from the website address: http://www.inference.phy.cam.ac.uk/mackay/itila/book.html p. 27