Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Similar documents
Lecture 3: Channel Capacity

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

EE 4TM4: Digital Communications II. Channel Capacity

Lecture 4 Noisy Channel Coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Capacity of a channel Shannon s second theorem. Information Theory 1/33

(Classical) Information Theory III: Noisy channel coding

LECTURE 10. Last time: Lecture outline

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

ELEC546 Review of Information Theory

LECTURE 13. Last time: Lecture outline

Lecture 15: Conditional and Joint Typicaility

Shannon s noisy-channel theorem

Shannon s Noisy-Channel Coding Theorem

Solutions to Homework Set #3 Channel and Source coding

Lecture 22: Final Review

Noisy channel communication

Lecture 4 Channel Coding

EE5585 Data Compression May 2, Lecture 27

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 4: Proof of Shannon s theorem and an explicit code

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 8: Shannon s Noise Models

Lecture 11: Polar codes construction

Appendix B Information theory from first principles

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

Lecture 10: Broadcast Channel and Superposition Coding

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 5 Channel Coding over Continuous Channels

Entropies & Information Theory

Lecture 11: Quantum Information III - Source Coding

Lecture 2. Capacity of the Gaussian channel

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Noisy-Channel Coding

UCSD ECE 255C Handout #12 Prof. Young-Han Kim Tuesday, February 28, Solutions to Take-Home Midterm (Prepared by Pinar Sen)

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Lecture 14 February 28

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Solutions to Set #2 Data Compression, Huffman code and AEP

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Shannon s Noisy-Channel Coding Theorem

Solutions to Homework Set #2 Broadcast channel, degraded message set, Csiszar Sum Equality

3F1 Information Theory, Lecture 3

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 2: August 31

X 1 : X Table 1: Y = X X 2

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Exercise 1. = P(y a 1)P(a 1 )

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Simultaneous Nonunique Decoding Is Rate-Optimal

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

The Method of Types and Its Application to Information Hiding

Capacity Definitions for General Channels with Receiver Side Information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Homework Set #2 Data Compression, Huffman code and AEP

Multicoding Schemes for Interference Channels

Distributed Source Coding Using LDPC Codes

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

A Comparison of Superposition Coding Schemes

Joint Write-Once-Memory and Error-Control Codes

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Block 2: Introduction to Information Theory

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Source and Channel Coding for Correlated Sources Over Multiuser Channels

LECTURE 3. Last time:

On Function Computation with Privacy and Secrecy Constraints

Lecture 5 - Information theory

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

The Poisson Channel with Side Information

The Capacity Region for Multi-source Multi-sink Network Coding

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Interactive Decoding of a Broadcast Message

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Reliable Computation over Multiple-Access Channels

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

A Comparison of Two Achievable Rate Regions for the Interference Channel

ECE 4400:693 - Information Theory

List of Figures. Acknowledgements. Abstract 1. 1 Introduction 2. 2 Preliminaries Superposition Coding Block Markov Encoding...

Capacity bounds for multiple access-cognitive interference channel

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Revision of Lecture 5

Keyless authentication in the presence of a simultaneously transmitting adversary

Quantum Information Theory and Cryptography

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

ECE Information theory Final (Fall 2008)

On the Capacity of the Two-Hop Half-Duplex Relay Channel

ELEMENT OF INFORMATION THEORY

Lecture 5: Asymptotic Equipartition Property

Transcription:

6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder X n m P Y X Y n Decoder ˆmY n Fig.. Channel coding setting. Prior to Shannon results, it was assumed that the error probability of the channel communication in Figure, grows as R grows, where R is the rate transmitted through the channel, i.e., the number of bits transmitted through the channel per one usage of the channel. According to Shannon theorem, as long as we allow a delay such that the encoding is done in blocks of size n, the error probability is arbitrary low for R C and is for R > C, where C is the channel capacity. This is illustraed in Figure 2. Assumption Memoryless property The channel is memoryless, i.e., P y i y i,x i,m = P y i x i,i =,...,n. Assumption 2 Discrete time The channel is Discrete time. We denote a discrete time memoryless channel as: DMC. Definition Code An n,2 nr code for the channel X,P Y X,Y consists of: A message M that is distributed uniformly on {,...,2 nr}. 2 An encoding function f : {,2,...,2 nr} X n, 2

6-2 P er a P er b R C R Fig. 2. a P er R as it was believed prior to Shannon theorem b P er R as derived from Shannon s theorem on capacity of channels yielding codewords x n,x n 2,...,x n 2 nr. The set of codewords is called the codebook. 3 A decoding function g : Y n {,2,...,2 nr}, 3 which is a deterministic rule that assigns a guess to all possible output sequences.. Definition 2 Maximal probability of error The maximal probability of error, P n max, for an n,2 nr code is defined as: P max n = max P M ˆM M = m, 4 m Note that ˆM depends on the chosen codebook since ˆM = gy n and X n = fm. Therefore, we abuse notation when we omit the dependence on the codebook. Definition 3 Average probability of error The average probability of error, P n er, for an n,2 nr code is defined as: P er n = Pr M ˆM 5

6-3 Definition 4 Achievable rate the rate R is achievable if there exists a sequence of n,2 nr codes such that: lim n Pn er = 0. 6 Definition 5 Capacity The capacity is denoted by C, and is defined as the supremum over all achievable rates. In contrast to lossless compression, where one is interested in minimizing the rate, here, higher rate corresponds to more information. Note that Def. 5 is an operational definition, namely, it arises from the communication definition. The next theorem relates the operational definition of capacity to a mathematical quantity and can be calculated. Theorem Channel capacity For a memoryless channel X,P Y X,Y, the capacity is given by where the joint distribution is P Y X. Remarks: The capacity satisfy C log 2 X. The capacity satisfy C log 2 Y. C = max I X;Y, 7 The capacity is concave in. Thus, the maximum can be computed efficiently.

6-4 II. EXAMPLES A. Binary clean channel The binary clean channel has binary input and output alphabets, X = Y = {0,}. The channel is given by Y = X and is described in Figure 3: 0 0 X Y Fig. 3. Binary clean channel It is easy to note that the maximal achievable rate is [bit/channel use]. The channel coding coding theorem asserts this simple observation with: C = max IX;Y = max HX =. 8 B. Noisy channel with non-overlapping outputs The channel has a input alphabets, X = {0,}, and quadratic channel output alphabet Y = {0,,2,3}. The channel is described in Figure 4: The capacity is at most C log X = [bit/channel use]. Moreover, we can achieve the upper bound with px Bern0.5. Formally, from the channel coding coding theorem: C = max IX;Y 9 = max HX HX Y 0 = max HX

6-5 0.5 0 0 X 0.5 0.25 2 Y 0.75 3 Fig. 4. Noisy channel with non-overlapping outputs. =. 2 C. BSC - Binary symmetric channel Consider the BSC, shown in Figure 5: δ 0 0 δ δ δ Fig. 5. BSC - Binary symmetric channel. C = H b δ The output Y, can be written as: Y = X Z, where Z Bernoulliδ and Z X Z and X are independent. Note that if Z =, an error occurred, and Z = 0 means that there is no error. The mutual information is given by: IX;Y = HY HY X

6-6 a = HY HY X X b = HY HZ X c = HY HZ d H b δ 3 Where: H 2 δ is the binary entropy function, i.e.: H 2 δ = δlogδ δlog δ 4 a - Given X, there is a one to one mapping between X and Y. b - Follows from the fact that Y X = Z. c - Follows from the fact that Z X. d - Follows from HY log Y. Note that if one chooses X Ber0.5, equality in d holds, i.e. IX;Y = H b δ. Special cases: Denote by Cδ, the BSC capacity When δ = 0, the channel is clean and C0 =. When δ = 0.5, for all input distributions, the input and the output and are independent. Therefore, the mutual information and the capacity are equal to zero. When δ =, each input is always flipped. However, the decoder is able to produce X by inverting the output Y. Thus, C =. Puzzle: Show from the operational definitions only that the capacity of the BSC satisfy Cδ = C δ. One more example, called the binary erasure channel, is explained in details in the appendix of the lecture. III. PROOF OF THE CONVERSE FOR THE CAPACITY THEOREM For the converse proof of the Coding Theorem, we will need a technical lemma that describes the probabilistic relations between inputs and outputs.

6-7 Lemma Memoryless channel without feedback For a memoryless channel without feedback, Py n x n,m = n Py i x i 5 Remark: When no feedback is available, a memoryless channel can also be defined by 5. Where: Proof of Lemma : Consider the following chain of equalities, i= Py n x n,m = Pyn,x n,m Px n,m = Pm n i= Py i,x i y i,x i,m Px n,m a = Pm n i= Px i y i,x i,mpy i x i,x i,y i,m Px n,m b = Pm n i= Px i x i,mpy i x i Px n,m c = PmPxn m n i= Py i x i Px n,m n = Py i x i 6 i= a Follows from the chain rule. b Follows from the memoryless property in and the Markov chain X i X i,m Y i. c Follows from the probability chain rule. Proof of the converse part of Theorem : In the converse part upper bound on C we need to prove that if R is an achievable rate then R C I maxix;y 7 PX Fix a code n,2 nr with a probability of error P n e. Denote by M the message, which is distributed uniformly over {,...,2 nr }, to be sent. Thus, we have: nr a = HM

6-8 = HM+HM Y n HM Y n b = IM;Y n +HM Y n c = IM;Y n +H M Y n, ˆM d IM;Y n +H M ˆM e IM;Y n ++P e nr f = IM;Y n +n ǫ n g = HY n HY n X n,m+n ǫ n h n [ = H Yi Y i HY i X i ] +n ǫ n 8 i = i= n [HY i HY i X i ]+n ǫ n 9 i= n I Y i ;X i +n ǫ n 20 i= j n C I +n ǫ n 2 Where: a The message distributed uniformly on,2,...,2 nr. Thus,HM = log M = nr. b Definition of Mutual Information c Since ˆM = gy n is a deterministic function of Y n. d Conditioning reduces entropy. e Follows from Fano s inequality: H P e nr. f follows by defining ǫ n = n +P er g X n is a function of the Message M. h Follows from Lemma. M ˆM i Follows from conditioning reduced entropy. + P M ˆM log M = + j Follows by taking the maximum over Px Px 2,...,Px n. The optimization problem is then equal to C I for all i =,...,n.

6-9 Now, dividing both sides of the equation by n, we have R C I +ǫ n. 22 If R is an achievabale rate, then there exists a sequence of codes n,2 nr such that P n e 0 which implies ǫ n 0. Therefore we obtained that if R is achievable, then R C I, and this completes the proof. We will now give a brief section on joint typicality. This tool will be used in the achievability part, where we show the existence of a code with rate that is arbitrarily close to C I and attains vanishing probability of error. IV. JOINT WEAK TYPICALITY Definition 6 Weak typicality The set of ǫ jointly typical sequences with respect to,y is defined by A n ǫ = {x n,y n X n Y n : n logpxn HX ǫ, 23 where px n,y n = n i=,yx i,y i Theorem 2 Let X n,y n be i.i.d. Y x,y then: lim n Pr{X n,y n A n ǫ} = 2 A n ǫ 2n HX,Y+ǫ 3 If Xn, Y n : Xn i.i.d x, Y n i.i.d P Y y, Xn, Y n then Proof: follows from the weak law of large numbers. 2 follows from: n logpyn HY ǫ, 24 n logpxn,y n HX,Y ǫ}, 25 nx n P Y ny n, { Pr X } n,ỹ n A n ǫ 2 nix,y 3 ǫ. 26 = x n,y n n,y nxn,y n

6-0 a n,y n xn,y n x n,y n A n ǫ b x n,y n A n ǫ 2 n HX,Y+ǫ = A n ǫ 2 n HX,Y+ǫ 27 a follows from decreasing the number of summed elements; b Follows from 25. 3 We need to upper bounds the probability that X n,ỹ n is in the set A n ǫ. We do so by summing over the probability of the elements in A n ǫ according to the distribution of X n,ỹ n. [ ] P Xn,Ỹ n A n ǫ = P Xn,Ỹ nxn,y n = x n,y n A n ǫ nx n P Y ny n x n,y n A n ǫ a A n ǫ 2 nhx ǫ 2 nhy ǫ b 2 nhx,y+ǫ 2 nhx ǫ 2 nhy ǫ c = 2 nix;y 3ǫ 28 Where: a - Using the bound on n x n = and P Y ny n according to?? and 24. b - Using the bound on the size of the set, i.e., A n ǫ 2 n HX,Y+ǫ. c - Follows for the definition of mutual information. V. PROOF OF THE ACHIEVABILITY We now prove that if R < C I then for a DMC there exists a sequence of codes 2 n,n such that P n er n 0. The proof is based on a random coding; the code is generated randomly, and then we show that the expectation over all codebooks of the

6- error probability goes to zero. Since the expected value goes to zero, there exists at least one code for which the probability of error goes to zero. Proof of the achievability part of Theorem : Design of the code: We fix x and a rate R, and generate the codebook C, with entries X n i, where X n i is the codeword associated with message i, and: X n i i.i.d x,i =,...,2 nr. Reveal the codebook to the encoder and the decoder. Encoder: For a message i, transmit X n i, that is, the ith codeword in C. Decoder: The decoder receives Y n, and looks for all X n C such that: X n,y n A n ǫ. If there is a unique codeword, X n i that is typical, it declares ˆm = i. In case no message or more than one message are jointly typical with Y n, it declares an error. Analysis of error: Consider P e = P = 2 nr m= = 2 nr M ˆM P M = mp 2 nr m= P M ˆM M = m M ˆM M = m, where the last equality holds since the message is distritbuted uniformly. Because of the symmetry in the code construction with respect to messages, we may assume that m =, and analyze P M ˆM M =. Recall that an error occurs if : E = X n,y n / A n ǫ 2 E j = X n j,y n A n ǫ. The probability of E tends to zero from Theorem 2. Consider the probability of the second error event, 2 nr P E2 = P a j=2 E j 2 nr P E j j=2

6-2 b 2 nr 2 nix;y 3ǫ = = 2 nr IX;Y+3ǫ 29 Where: a Follows from the union bound, e.g., P A B P A+P B. b Follows from Theorem 2. Hence if R < I X;Y, then P Error2 n 0. Note that we showed that E C [P n e ] 0, but we need to show that there exists a single code code with P e 0. This is settled from the fact that if the expectation of a R.V. goes to zero, then there exists an instance of the R.V. that goes to zero as well. Having showed that there exists a code s.t. the average error probability, defined in 3, tends to 0, as n, we will now show that a small average probability of error implies a small maximal probability of error, defined in 2, at essentially the same rate. Theorem 3 Half Codewords Assume that P er = ǫ. There exists a set of codewords that is half of the size, i.e.: 2 nr 2 = 2n R n 30 and P max 2ǫ, where P er and P max are defined in 3 and 2, respectively. Proof: If we throw away the worst half of the codewords, with the highest error probabilities, we will remain with a codebook, consisting of the best half of the codewords. The remaining codewords must have a maximal probability of error less than 2ǫ Otherwise, these codewords themselves would contribute more than 2ǫ to the sum, and P er would be greater than ǫ. If we reindex these codewords, we have 2 nr codewords. Throwing out half the codewords has changed the rate from R to R n, which is negligible for large n. This implies that if achievability holds for the average error probability, it holds for the maximum error probability as well. Example Illustration of proof of Theorem 3 Assume that the average score in a specific class is 30, follows that at least half of the class, scored less than 60.

6-3 Proof: If half of the class scored exactly 60, and the other half scored exactly 0, then the average score is 30. But if more than 50 percent score above 60, the average score is above 30 and we get contradiction. In our example, the score 30 represents ǫ and the score 60 represents 2 ǫ. APPENDIX A BINARY ERASURE CHANNEL BEC A Binary Erasure Channel BEC is a common communications channel model used in coding theory and information theory. In this model, a transmitter sends a bit a zero or a one, and the receiver either receives the bit or it receives a symbol? that the represents that the bit was erased, namely, the receiver knows that a bit was sent but it does not know which one. For instance, in an internet protocol,? may represent that the packet received was corrupted and {0, } may represent tow possible packets. 0 e 0 e X? Y e e Fig. 6. Binary Erasure Channel Where? stands for an erased bit. No-Feedback channel: The capacity of the channel is given by C = max PX IX;Y, and C is known to be the upper bound of the achievable rates. So, we try to find the channel capacity:

6-4 IX;Y = HX HX Y 3 = HX ψ Y Py = ψhx y = ψ 32 = HX Py = 0HX y = 0 Py = HX y = Py =? HX y =? 33 Where? stands for an erased bit See Figure 6. Since the model of the channel suggests that for a successfully recieved bit we know the sent bit with probability of, meaning X is determined by Y given y?, we get HX y = 0 = HX y = = 0. Also, if the bit was erased, by the symmetry of the channel we have no additional information regarding the value of the transmitted bit. Therefore HX Y =? = HX, and Py =? = e by symmetry. Substituting this into Eq. 3: IX;Y = HX Py =? HX y =? = HX e HX = ehx 34 Finally, we need to find the supremum of the mutual information over all the possible distributions of X. C = sup IX;Y = esup HX = e 35 Where the last equation holds for X Bernoulli.Therefore an upper bound on 2 the achievable rate is indeed R C = e, as suggested. 2 Code achieving capacity that uses feedback: Assume the transmitter at timeiknows the previous outputs of the channel,i.e., y i, so we can re-transmit the erased bit. In order to successfully recieve n bits, we must transmit n e bits: First we transmit n bits. Since P e = e and we have feedback, we know that e n bits were erased, so we need to re-transmit them. This time, e en bits were erased, so once again we re-transmit them, and so on. All together, we have transmitted:

6-5 n+en+e 2 n+ = e j n = n j=0 In order to successfully recieve n bits. e j = n e = n e j=0 36 Definition 7 Code Rate In telecommunication and information theory, the code rate R of a channel code is the proportion of the data-stream that is useful non-redundant. That is, if the code rate is R = k/n, for every k bits of useful information, the coder generates totally n bits of data, of which n-k are redundant. In the feedback erasure channel scenario, the ratio of useful information to total sent information was R = n n/ e = e