Noisy channel communication

Similar documents
Entropies & Information Theory

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 1: Introduction, Entropy and ML estimation

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Noisy-Channel Coding

Shannon s noisy-channel theorem

Lecture 8: Shannon s Noise Models

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 4 Noisy Channel Coding

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 3: Channel Capacity

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Exercise 1. = P(y a 1)P(a 1 )

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

X 1 : X Table 1: Y = X X 2

(Classical) Information Theory III: Noisy channel coding

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Lecture 14 February 28

9 Forward-backward algorithm, sum-product on factor graphs

Capacity of a channel Shannon s second theorem. Information Theory 1/33

ELEMENT OF INFORMATION THEORY

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

ECE Information theory Final

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Lecture 11: Polar codes construction

5. Density evolution. Density evolution 5-1

Lecture 18: Quantum Information Theory and Holevo s Bound

Principles of Communications

Solutions to Homework Set #3 Channel and Source coding

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 15: Conditional and Joint Typicaility

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Chapter 9 Fundamental Limits in Information Theory

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Bioinformatics: Biology X

CS 630 Basic Probability and Information Theory. Tim Campbell

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Shannon s Noisy-Channel Coding Theorem

Lecture 2: August 31

Computational Systems Biology: Biology X

Lecture 11: Continuous-valued signals and differential entropy

to mere bit flips) may affect the transmission.

Quantum Information Theory and Cryptography

One Lesson of Information Theory

Communication Theory II

ELEC546 Review of Information Theory

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Towards a Theory of Information Flow in the Finitary Process Soup

An introduction to basic information theory. Hampus Wessman

5 Mutual Information and Channel Capacity

Shannon s Noisy-Channel Coding Theorem

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Revision of Lecture 5

CSCI 2570 Introduction to Nanocomputing

Entropy as a measure of surprise

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Information Theory - Entropy. Figure 3

Revision of Lecture 4

Chapter I: Fundamental Information Theory

Information Theory. Week 4 Compressing streams. Iain Murray,

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Shannon's Theory of Communication

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

ECE Information theory Final (Fall 2008)

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 5 - Information theory

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.

Distributed Source Coding Using LDPC Codes

Physical Layer and Coding

Lecture 4 Channel Coding

Delay, feedback, and the price of ignorance

3F1 Information Theory, Lecture 1

Upper Bounds on the Capacity of Binary Intermittent Communication

ECEN 655: Advanced Channel Coding

Information Theory Primer:

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Appendix B Information theory from first principles

UNIT I INFORMATION THEORY. I k log 2

Lecture 22: Final Review

The Method of Types and Its Application to Information Hiding

Lecture 16 Oct 21, 2014

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

3F1 Information Theory, Lecture 3

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

9 THEORY OF CODES. 9.0 Introduction. 9.1 Noise

Lecture 2. Capacity of the Gaussian channel

Conditional probabilities and graphical models

6.02 Fall 2011 Lecture #9

Lecture 11: Quantum Information III - Source Coding

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Source and Channel Coding for Correlated Sources Over Multiuser Channels

Computing and Communications 2. Information Theory -Entropy

Transcription:

Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University of Edinburgh Noisy communication was outlined in lecture 1, then abandoned to cover compression, representing messages for a noiseless channel. Why compress, remove all redundancy, just to add it again? Firstly remember that repetition codes require a lot of repetitions to get a negligible probability of error. We are going to have to add better forms of redundancy to get reliable communication at good rates. Our files won t necessarily have the right sort of redundancy. It is often useful to have modular designs. We can design an encoding/decoding scheme for a noisy channel separately from modelling data. Then use a compression system to get our file appropriately distributed over the required alphabet. It is possible to design a combined system that takes redundant files and encodes them for a noisy channel. MN codes do this: http://www.inference.phy.cam.ac.uk/mackay/mncn.pdf These lectures won t discuss this option. Noisy channel communication Input File compress x = 00110001 add redundancy 00110001 00110001 Noisy 00110001 Channel transmitted message 00011001 01110011 10110100 decode often called t or x ˆx = 00110001 decompress If all errors corrected: Original file Discrete Memoryless Channel, Q Discrete: Inputs x and Outputs y have discrete (sometimes binary) alphabets: x A X Q j i = P (y =b j x=a i ) y A Y Memoryless: outputs always drawn using fixed Q matrix We also assume channel is synchronized

Synchronized channels We know that a sequence of inputs was sent and which outputs go with them. Binary Symmetric Channel (BSC) A natural model channel for binary data: Alternative view: noise drawn from p(n) = { 1 f n = 0 f n = 1 Dealing with insertions and deletions is a tricky topic, an active area of research that we will avoid Binary Erasure Channel (BEC) An example of a non-binary alphabet: y = (x + n) mod 2 = x XOR n % Matlab/Octave y = mod(x+n, 2); y = bitxor(x, n); Z channel /* C (or Python) */ y = (x+n) % 2; y = x ^ n; Cannot always treat symbols symmetrically With this channel corruptions are obvious Ink gets rubbed off, but never added Feedback: could ask for retransmission Care required: negotiation could be corrupted too Feedback sometimes not an option: hard disk storage The BEC is not the deletion channel. Here symbols are replaced with a placeholder, in the deletion channel they are removed entirely and it is no longer clear at what time symbols were transmitted.

Channel Probabilities Channel definition: Q j i = P (y =b j x=a i ) Assume there s nothing we can do about Q. We can choose what to throw at the channel. Input distribution: p X = p(x=a 1 ). p(x=a I ) Joint distribution: P (x, y) = P (x) P (y x) Output distribution: P (y) = x P (x, y) A little more detail on channel probabilities: More detail on why the output distribution can be found by a matrix multiplication: p Y,j = P (y =b j ) = i = i = i p Y = Q p X P (y =b j, x=a i ) P (y =b j x=a i ) P (x=a i ) Q j i p X,i Care: some texts (but not MacKay) use the transpose of our Q as the transition matrix, and so use left-multiplication instead. vector notation: p Y = Q p X (the usual relationships for any two variables x and y) Channels and Information Three distributions: P (x), P (y), P (x, y) Three observers: sender, receiver, omniscient outsider Average surprise of receiver: H(Y ) = y P (y) log 1 /P (y) Partial information about sent file and added noise Average information of file: H(X) = x P (x) log 1 /P (x) Sender observes all of this, but no information about noise Joint Entropy Omniscient outsider gets more information on average than an observer at one end of the channel: H(X, Y ) H(X) Outsider can t have more information than both ends combined: H(X, Y ) H(X) + H(Y ) with equality only if X and Y are independent (independence useless for communication!) Omniscient outsider experiences total joint entropy of file and noise: H(X, Y ) = x,y P (x, y) log 1 /P (x,y)

Mutual Information (1) How much too big is H(X) + H(Y ) H(X, Y )? Overlap: I(X; Y ) = H(X) + H(Y ) H(X, Y ) is called the mutual information Inference in the channel The receiver doesn t know x, but on receiving y can update the prior P (x) to a posterior: P (x y) = P (x, y) P (y) = P (y x) P (x) P (y) e.g. for BSC with P (x=1) = 0.5, P (x y) = { 1 f x = 0 f x = 1 other channels may have less obvious posteriors Another distribution we can compute the entropy of! It s the average information content shared by the dependent X and Y ensembles. (more insight to come) Conditional Entropy (1) We can condition every part of an expression on the setting of an arbitrary variable: H(X y) = x P (x y) log 1 /P (x y) Average information available from seeing x, given that we already know y. On average this is written: Conditional Entropy (2) Similarly H(Y X) = x,y P (x, y) log 1 /P (y x) is the average uncertainty about the output that the sender has, given that she knows what she sent over the channel. Intuitively this should be less than the average surprise that the receiver will experience, H(Y ). H(X Y ) = y P (y) H(X y) = x,y P (x, y) log 1 /P (x y)

Conditional Entropy (3) Mutual Information (2) The chain rule for entropy: H(X, Y ) = H(X) + H(Y X) = H(Y ) + H(X Y ) The average coding cost of a pair is the same regardless of whether you treat them as a joint event, or code one and then the other. Proof: H(X, Y ) = x = x y [ p(x) p(y x) log 1 ] p(x) + log 1 p(y x) p(x) log 1 The Capacity Where are we going? p(x) y p(y x) 1 + x 1 p(x, y) log p(y x) I(X; Y ) depends on the channel and input distribution p X The Capacity: C(Q) = max px I(X; Y ) C gives the maximum average amount of information we can get in one use of the channel. We will see that reliable communication is possible at C bits per channel use. y The receiver thinks: I(X; Y ) = H(X) H(X Y ) The mutual information is, on average, the information content of the input minus the part that is still uncertain after seeing the output. That is, the average information that we can get about the input over the channel. I(X; Y ) = H(Y ) H(Y X) is often easier to calculate Lots of new definitions When dealing with extended ensembles, independent identical copies of an ensemble, entropies were easy: H(X K ) = K H(X). Dealing with channels forces us to extend our notions of information to collections of dependent variables. For every joint, conditional and marginal probability we have a different entropy and we ll want to understand their relationships. Unfortunately this meant seeing a lot of definitions at once. They are summarized on pp138 139 of MacKay. And also in the following tables.

The probabilities associated with a channel Very little of this is special to channels, it s mostly results for any pair of dependent random variables. Distribution Where from? Interpretation / Name P (x) We choose Input distribution P (y x) Q, channel definition Channel noise model Sender s beliefs about output P (x, y) p(y x) p(x) Omniscient outside observer s joint distribution P (y) x p(x, y) = Q p X (Marginal) output distribution P (x y) p(y x) p(x)/p(y) Receiver s beliefs about input. Inference Corresponding information measures H(X) x p(x) log 1 /p(x) Ave. info. content of source Sender s ave. surprise on seeing x H(Y ) y p(y) log 1 /p(y) H(X, Y ) Ave. info. content of output Partial info. about x and noise Ave. surprise of receiver x,y p(x, y) log 1 /p(x,y) Ave. info. content of (x, y) or source and noise. Ave. surprise of outsider H(X y) x p(x y) log 1 /p(x y) Uncertainty after seeing output H(X Y ) x,y p(x, y) log 1 /p(x y) Average, E p(y) [H(X y)] H(Y X) x,y p(x, y) log 1 /p(y x) Sender s ave. uncertainty about y I(X; Y ) H(X)+H(Y ) H(X, Y ) Overlap in ave. info. contents H(X) H(X Y ) Ave. uncertainty reduction by y Ave info. about x over channel. H(Y ) H(Y X) Often easier to calculate And review the diagram relating all these quantities! Ternary confusion channel Assume p X = [ 1 /3, 1 /3, 1 /3]. What is I(X; Y )? H(X) H(X Y ) = H(Y ) H(Y X) = 1 1 /3 = 2 /3 Optimal input distribution: p X = [ 1 /2, 0, 1 /2] For which I(X; Y ) = 1, the capacity of the channel.