Shannon s noisy-channel theorem

Similar documents
Shannon s Noisy-Channel Coding Theorem

Lecture 3: Channel Capacity

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Shannon s Noisy-Channel Coding Theorem

Exercise 1. = P(y a 1)P(a 1 )

(Classical) Information Theory III: Noisy channel coding

National University of Singapore Department of Electrical & Computer Engineering. Examination for

X 1 : X Table 1: Y = X X 2

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

LECTURE 10. Last time: Lecture outline

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 15: Conditional and Joint Typicaility

Entropies & Information Theory

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Noisy channel communication

ELEC546 Review of Information Theory

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

LECTURE 13. Last time: Lecture outline

Lecture 4 Noisy Channel Coding

Solutions to Homework Set #3 Channel and Source coding

Lecture 14 February 28

EE 4TM4: Digital Communications II. Channel Capacity

The Method of Types and Its Application to Information Hiding

A Comparison of Two Achievable Rate Regions for the Interference Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Appendix B Information theory from first principles

A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels

Channel Coding: Zero-error case

On Multiple User Channels with State Information at the Transmitters

Coding into a source: an inverse rate-distortion theorem

Chapter 9 Fundamental Limits in Information Theory

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

ECE Information theory Final (Fall 2008)

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Nash Equilibrium and information transmission coding and decoding rules

The Poisson Channel with Side Information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Cut-Set Bound and Dependence Balance Bound

Noisy-Channel Coding

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 11: Polar codes construction

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

4 An Introduction to Channel Coding and Decoding over BSC

for some error exponent E( R) as a function R,

The Capacity Region for Multi-source Multi-sink Network Coding

6.02 Fall 2011 Lecture #9

CSCI 2570 Introduction to Nanocomputing

On the Capacity of the Two-Hop Half-Duplex Relay Channel

Shannon's Theory of Communication

Quantum Information Theory and Cryptography

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

ECE 534 Information Theory - Midterm 2

Lecture 10: Broadcast Channel and Superposition Coding


Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture Notes for Communication Theory

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Lecture 1: Introduction, Entropy and ML estimation

A Summary of Multiple Access Channels

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

EE5585 Data Compression May 2, Lecture 27

Shannon s A Mathematical Theory of Communication

Lecture 11: Quantum Information III - Source Coding

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

CS 630 Basic Probability and Information Theory. Tim Campbell

Intermittent Communication

Lecture 2: August 31

FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Efficient Use of Joint Source-Destination Cooperation in the Gaussian Multiple Access Channel

ECE Information theory Final

16.36 Communication Systems Engineering

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Distributed Source Coding Using LDPC Codes

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Lecture 8: Shannon s Noise Models

Joint Write-Once-Memory and Error-Control Codes

channel of communication noise Each codeword has length 2, and all digits are either 0 or 1. Such codes are called Binary Codes.

Lecture 4 Channel Coding

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Towards a Theory of Information Flow in the Finitary Process Soup

Equivalence for Networks with Adversarial State

Multicoding Schemes for Interference Channels

Chapter 9. Gaussian Channel

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Interference Channels with Source Cooperation

Information Leakage of Correlated Source Coded Sequences over a Channel with an Eavesdropper

1 Introduction to information theory

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Transcription:

Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 1 / 21

Noisy channel During the course we assumed that information sent over a channel was noise-free, that is information was losslesly transmitted through the channel. Real channels are noisy. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 2 / 21

Noisy channel During the course we assumed that information sent over a channel was noise-free, that is information was losslesly transmitted through the channel. Real channels are noisy. We want to find out how to send messages through a noisy channel such that the rate of messages send is maximized, but the error is small. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 2 / 21

Outline 1 The problem of noisy-channels Example Definitions 2 Shannon s noisy-channel theorem Idea proof Outline of proof mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 3 / 21

Binary Symmetric Channel Example P(y = 0 x = 0) = 1 p P(y = 1 x = 0) = p P(y = 0 x = 1) = p P(y = 1 x = 1) = 1 p mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 4 / 21

Discrete memoryless channel Definition A discrete memoryless channel(x, P(Y X ), Y) is characterized by an input alphabet X and an output alphabet Y and a set of conditional probability distributions P(y x), one for each x X. These transition probabilities may be written in matrix form: Q ji := P(y = b j x = a i ) The n th extension is the channel (X n, P(Y n X n ), Y n ) with p(y n x n ) = n i=1 p(y i x i ) Binary symmetric channel mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 5 / 21

Definitions Capacity We define the information channel capacity of a discrete memoryless channel as C = max P x I(X ; Y ) Where P x is the probability distribution of X, the random variable over the alphabet X. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 6 / 21

Definitions Capacity We define the information channel capacity of a discrete memoryless channel as C = max P x I(X ; Y ) Where P x is the probability distribution of X, the random variable over the alphabet X. Block code An (M,n) code for the channel (X, P(Y X ), Y) consists of the following: An index set {1, 2,..., M}. An encoding function X n : {1, 2,..., M} X n, yielding codewords x n (1), x n (2),..., x n (M). The set of codewords is called the codebook. A decoding function g : Y n {1, 2,..., M}. Deterministic rule which assigns a guess to each y Y n. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 6 / 21

Formal notions of error Given that i was sent, the probability of error: λ i = P(g(Y n ) i X n = x n (i)) Average : P (n) e = 1 M M i=1 λ i mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 7 / 21

Formal notions of error Given that i was sent, the probability of error: λ i = P(g(Y n ) i X n = x n (i)) Average : P (n) e = 1 M M i=1 λ i Rate The rate of an (M,n) code is R = log(m), bits per transmission n mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 7 / 21

Shannon s noisy-channel theorem Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 8 / 21

Noisy-Typewriter mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 9 / 21

Noisy-Typewriter Example of theorem We take the index set to be: {1, 2,..., 13} mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 9 / 21

Noisy-Typewriter Example of theorem We take the index set to be: {1, 2,..., 13} The following encoding function X (1) = a, X (2) = c,..., x(m) = y mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 9 / 21

Noisy-Typewriter Example of theorem We take the index set to be: {1, 2,..., 13} The following encoding function X (1) = a, X (2) = c,..., x(m) = y The decoding function maps the received letter to the nearest letter in the code. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 9 / 21

Noisy-Typewriter Example of theorem We take the index set to be: {1, 2,..., 13} The following encoding function X (1) = a, X (2) = c,..., x(m) = y The decoding function maps the received letter to the nearest letter in the code. Then our rate R = log(13) 1, which can be shown to be smaller than capacity and the error is always zero. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 9 / 21

Idea Idea For large block lengths, every channel looks like the noisy typewriter; the channel has a subset of inputs that produce essentially disjoint sequences at the output. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 10 / 21

Typical sequences Definition typical sequence Let X be a random variable over an alphabet X. A sequence x X of length n is called typical of tolerance β if and only if 1 n log 1 p(x N H(X ) < β ) mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 11 / 21

Typical sequences Definition typical sequence Let X be a random variable over an alphabet X. A sequence x X of length n is called typical of tolerance β if and only if 1 n log 1 p(x N H(X ) < β ) Example Suppose we flip a coin 10 times, then x := 1111100000 Is typical for every β 0. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 11 / 21

Joint Typicality Definition jointly typical sequence Let X, Y be a random variable over the alphabets X, Y. Two sequences x X n and y Y n of length n are called typical of tolerance β if and only if both x and y are typical and 1 n log 1 p(x n, y n H(X, Y ) < β ) We define A (n) ɛ to be the set of jointly typical sequences. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 12 / 21

Typicality theorems Theorem Typicality theorem: Let (X n, Y n ) be sequences of length n drawn i.i.d according to p(x n, y n ) = n i=1 p(x i, y i ). Then: 1 P((X n, Y n ) A (n) ɛ ) 1 as n 2 (1 ɛ)2 n(h(x,y ) ɛ) A (n) ɛ 2 3 if (X n, Y n ) p(x n )p(y n ), then Intuition n(h(x,y )+ɛ) (1 ɛ)2 n(i (X ;Y )+3ɛ) P((X n, Y n ) A (n) n(i (X ;Y ) 3ɛ) ɛ ) 2 1 Large messages wil always become typical Amon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 13 / 21

Typicality theorems Theorem Typicality theorem: Let (X n, Y n ) be sequences of length n drawn i.i.d according to p(x n, y n ) = n i=1 p(x i, y i ). Then: 1 P((X n, Y n ) A (n) ɛ ) 1 as n 2 (1 ɛ)2 n(h(x,y ) ɛ) A (n) ɛ 2 3 if (X n, Y n ) p(x n )p(y n ), then Intuition n(h(x,y )+ɛ) (1 ɛ)2 n(i (X ;Y )+3ɛ) P((X n, Y n ) A (n) n(i (X ;Y ) 3ɛ) ɛ ) 2 1 Large messages wil always become typical 2 Size of set of typical messages is approximately 2 nh(x,y ) Amon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 13 / 21

Typicality theorems Theorem Typicality theorem: Let (X n, Y n ) be sequences of length n drawn i.i.d according to p(x n, y n ) = n i=1 p(x i, y i ). Then: 1 P((X n, Y n ) A (n) ɛ ) 1 as n 2 (1 ɛ)2 n(h(x,y ) ɛ) A (n) ɛ 2 3 if (X n, Y n ) p(x n )p(y n ), then Intuition n(h(x,y )+ɛ) (1 ɛ)2 n(i (X ;Y )+3ɛ) P((X n, Y n ) A (n) n(i (X ;Y ) 3ɛ) ɛ ) 2 1 Large messages wil always become typical 2 Size of set of typical messages is approximately 2 nh(x,y ) 3 The odds that two random messages are jointly typical is small for large n and depends on the mutual information Amon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 13 / 21

Decoding Decoding by joint typicality The probability that any pair of typical X n and Y n are jointly typical is about 2 ( n(i (X ;Y )) (part 3 of theorem), mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 14 / 21

Decoding Decoding by joint typicality The probability that any pair of typical X n and Y n are jointly typical is about 2 ( n(i (X ;Y )) (part 3 of theorem), Hence we expect that if we consider 2 n(i (X ;Y )) such pairs before coming across a jointly typical pair. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 14 / 21

Decoding Decoding by joint typicality The probability that any pair of typical X n and Y n are jointly typical is about 2 ( n(i (X ;Y )) (part 3 of theorem), Hence we expect that if we consider 2 n(i (X ;Y )) such pairs before coming across a jointly typical pair. Thus if we decode based on joint typicality, the odds that we confuse a codeword with the codeword that caused the output Y n is small if we have 2 n(i (X ;Y )) codewords. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 14 / 21

Outline of proof Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Proof outline We select 2 nr random codewords. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 15 / 21

Outline of proof Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Proof outline We select 2 nr random codewords. Decode by joint typicality mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 15 / 21

Outline of proof Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Proof outline We select 2 nr random codewords. Decode by joint typicality Analyse the error, there are 2 types: The output Y n is not jointly typical with the transmitted codeword There is some other codeword jointly typical with Y n Amon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 15 / 21

Outline of proof Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Proof outline We select 2 nr random codewords. Decode by joint typicality Analyse the error, there are 2 types: The output Y n is not jointly typical with the transmitted codeword There is some other codeword jointly typical with Y n The first error type goes to zero because of (1) from previous slide mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 15 / 21

Outline of proof Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Proof outline We select 2 nr random codewords. Decode by joint typicality Analyse the error, there are 2 types: The output Y n is not jointly typical with the transmitted codeword There is some other codeword jointly typical with Y n The first error type goes to zero because of (1) from previous slide The second error goes to zero if R < C and because of reasoning above. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 15 / 21

Proof Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Creating a code Fix p(x). Generate 2 nr codewords independently at random according to the distribution n p(x n ) = p(x i ). And assign a codeword X (W ) to each message W. Note that this code has rate R. Furthermore, we make this code known to both the sender and receiver. i=1 mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 16 / 21

Decoding Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Decoding The receiver declares that the message W was sent if the following conditions are satisified: 1 (X n ( W ), Y n ) is jointly typical 2 There is no other index W W such that (X n (W ), Y n ) are jointly typical Thus we make a mistake when: 1 The output Y n is not jointly typical with the transmitted codeword 2 There is some other codeword jointly typical with Y n mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 17 / 21

Analysing error (1) Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Analysing error (1) By the first part of the typicality theorem we know that ɛ N : P((X n ( W ), Y n ) / A (n) ɛ ) ɛ mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 18 / 21

Analysing error (2) Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Analysing error (2) We know by the third part of the typicality theorem that a random X n (W ) and Y n are jointly typical with odds 2 n(i (X ;Y ) 3ɛ). There are 2 nr 1 such cases. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 19 / 21

Overall error Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Overall error Thus with the union bound and the previous slide: Pr(Error) = Pr(Error1 Error2) ɛ + 2 nr i=2 2 n(i (X ;Y ) 3ɛ) n(i (X ;Y ) R 3ɛ) ɛ + 2 If n is sufficiently large and R < I (X ; Y ) 3ɛ we get: Pr(Error) 2ɛ mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 20 / 21

Overall error Theorem For a discrete memory-less channel, for every rate R < C, there exists a sequence of (2 nr, n) codes with maximum probability of error λ (n) 0 Overall error Thus with the union bound and the previous slide: Pr(Error) = Pr(Error1 Error2) ɛ + 2 nr i=2 2 n(i (X ;Y ) 3ɛ) n(i (X ;Y ) R 3ɛ) ɛ + 2 If n is sufficiently large and R < I (X ; Y ) 3ɛ we get: Pr(Error) 2ɛ If we take our distribution p(x) to be the distribution p (x) which achieves capacity we can replace the condition R < I (X ; Y ) by R < C. This proofs our theorem. mon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 20 / 21

Thanks for listening! Amon Elders (Korteweg de Vries Institute for Mathematics University Shannon s of Amsterdam.) noisy-channel theorem Tuesday, 26th of Januari 21 / 21