Lecture 8: Channel Capacity, Continuous Random Variables

Similar documents
ECE 4400:693 - Information Theory

Lecture 17: Differential Entropy

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Principles of Communications

1 Joint and marginal distributions

Chapter I: Fundamental Information Theory

Lecture 14 February 28

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Lecture 2. Capacity of the Gaussian channel

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Lecture 11: Continuous-valued signals and differential entropy

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Revision of Lecture 5

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 8: Shannon s Noise Models

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

One Lesson of Information Theory

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 5 Channel Coding over Continuous Channels

Lecture 4 Noisy Channel Coding

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

x log x, which is strictly convex, and use Jensen s Inequality:

Recitation 2: Probability

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

ELEC546 Review of Information Theory

Chapter 4 Multiple Random Variables

LECTURE 3. Last time:

ECE Information theory Final

Appendix B Information theory from first principles

Noisy channel communication

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Communication Theory II

Chapter 2: Random Variables

Exercise 1. = P(y a 1)P(a 1 )

Exercises with solutions (Set D)

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

X 1 : X Table 1: Y = X X 2

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

EE 4TM4: Digital Communications II Scalar Gaussian Channel

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Computing and Communications 2. Information Theory -Entropy

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

Lecture 2: August 31

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

5 Mutual Information and Channel Capacity

Chapter 4: Continuous channel and its capacity

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

conditional cdf, conditional pdf, total probability theorem?

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

ENGG2430A-Homework 2

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

EE4601 Communication Systems

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Noisy-Channel Coding

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

Recent Results on Input-Constrained Erasure Channels

Joint Iterative Decoding of LDPC Codes and Channels with Memory

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Continuous r.v practice problems

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

1 Random Variable: Topics

Block 2: Introduction to Information Theory

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Solutions to Homework Set #3 Channel and Source coding

Generalized Writing on Dirty Paper

Bivariate distributions

Investigation of the Elias Product Code Construction for the Binary Erasure Channel

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Solution to Assignment 3

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Problem Set 7 Due March, 22

Lecture 22: Final Review

5. Density evolution. Density evolution 5-1

Energy State Amplification in an Energy Harvesting Communication System

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

STAT 430/510: Lecture 16

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Review of Probability Theory

(Classical) Information Theory III: Noisy channel coding

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Information Theory Primer:

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Lecture 1: Introduction, Entropy and ML estimation

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Homework 2: Solution

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Lecture 9 Polar Coding

Transcription:

EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel Capacity Given a channel with inputs X and outputs Y with transition probability P (Y X): Define: Channel capacity C is the maximal rate of reliable communication (over memoryless channel characterized by P (Y X)). Further, recall the following definition: Theorem. Proof: C (I) = max P X I(X; Y ). C = C (I). We will see this proof in the coming lectures. This theorem is important because C is challenging to optimize over, whereas C (I) is a tractable optimization problem.. Examples Example I. Channel capacity of a Binary Symmetric Channel (BSC). Define alphabets X = Y = {0, }. A BSC is defined by the PMF: { p y x P Y X (y x) = p y = x. This is equivalent to a channel matrix ( p ) p p p The rows of the matrix correspond to input symbols 0 and, while the columns correspond to output symbols 0 and. And the graph representation

This can also be expressed in the form of additive noise. Y = X 2 Z, where Z Ber(p) and Z is independent of X. To determine the channel capacity of a BSC, by the theorem we must maximize the mutual information. I(X; Y ) = H(Y ) H(Y X) = H(Y ) H(X 2 Z X) Once we condition on X, the uncertainty in X 2 Z is same as the uncertainty in Z. Formally, we can simplify the second term (can also be shown by separately considering cases X = 0 and X = ): I(X; Y ) = H(Y ) H(Z) = H(Y ) h 2 (p) h 2 (p). where h 2 (p) is the binary entropy function. Taking X Ber( 2 ) achieves equality: I(X; Y ) = h 2(p) (note: by symmetry X Ber( 2 ) implies Y Ber( 2 )). Thus, C = h 2(p). Example II. Channel capacity of a Binary Erasure Channel (BEC). Define alphabets X = {0, } and Y = {0,, e} where e stands for erasure. Any input symbol X i has a probability α of being retained in the output sequence and a probability α of being erased. Schematically, we have: 2

Examining the mutual information, we have that I(X; Y ) = H(X) H(X Y ) = H(X) H(X Y = e)p (Y = e) + H(X Y = 0)P (Y = 0) + H(X Y = )P (Y = )] = H(X) H(X) α + 0 P (Y = 0) + 0 P (Y = )] = ( α)h(x) Because the entropy of a binary variable can be no larger than : ( α)h(x) α Equality is achieved when H(X) =, that is X Ber( 2 ). Thus, the capacity of the BEC is C = α. Note that if we knew exactly which positions were going to be erased, we could communicate at this rate by sending the input bits at exactly those positions (since the expected fraction of erasures is α). The fact that C = α indicates that we can achieve this rate even when we do not know which positions are going to be erased. 2 Information of Continuous Random Variables The channel capacity theorem also holds for continuous valued channels, which are very important in a number of practical scenarios, e.g., in wireless communication. But before studying such channels, we need to extend notions like entropy and mutual information for continuous random variables. Definition: The relative entropy between two probability density functions f and g is given by D(f g) = f(x) log f(x) g(x) dx. Exercise: Show that D(f g) 0 with equality if and only if f = g. Proof. Observe that that D(f g) = f(x) log f(x) g(x) dx = f(x) log g(x) f(x) dx = E log g(x) ] f(x) ] g(x) log E f(x) = log = log = 0. Equality occurs in the manner of Jensen s when f = g. f(x) g(x) f(x) dx Definition: The mutual information between X and Y that have a joint probability density function f X,Y is I(X; Y ) = D(f X,Y f X f Y ). 3

Definition: The differential entropy of a continuous random variable X with probability density function f X is h(x) = f X (x) log f X (x) dx = E log f X (X)] If X, Y have joint density f X,Y, the conditional differential entropy is h(x Y ) = f X,Y (x, y) log f X Y (x y) dx dy = E log f X Y (X Y )], and the joint differential entropy is h(x, Y ) = f X,Y (x, y) log f X,Y (x, y) dx dy = E log f X,Y (X, Y )]. 2. Exercises Exercise. Show that with equality iff X and Y are independent. h(x Y ) h(x) Proof. This follows from exercise 2 below combined with the fact that I(X; Y ) 0 which holds since the relative entropy is non-negative (see exercise above). The equality holds exactly when f X,Y (x, y) = f X (x)f Y (y), i.e. when X and Y are independent. Exercise 2. Show that I(X; Y ) = h(x) h(x Y ) = h(y ) h(y X) = h(x) + h(y ) h(x, Y ). Proof. I(X; Y ) = = = f X,Y (x, y) log f X,Y (x, y) f X (x)f Y (y) dxdy f X,Y (x, y) log f X,Y (x, y) dxdy f X,Y (x, y) log f Y (y)dxdy f X (x) ] f X (x) f Y X (y x) log f Y X (y x)dy dx f Y (y) log f Y (y)dy = h(y ) h(y X). Symmetrically the same can be shown for I(X; Y ) = H(X) H(X Y ). Also I(X; Y ) = f X,Y (x, y) log f X,Y (x, y) f X (x)f Y (y) dxdy = f X,Y (x, y) log f X,Y (x, y)dxdy f X,Y (x, y) log f X (x)dxdy f X,Y (x, y) log f Y (y)dxdy = h(x, Y ) h(x) h(y ). Exercise 3. Show that h(x + c) = h(x). 4

and h(c X) = h(x) + log c, c 0. Proof. We will combine them and prove that h(ax + b) = h(x) + log a whenever a 0. Let Y = ax + b. Then the pdf for Y can be written as f Y (y) = ( ) y b a f X a Thus, h(ax + b) = h(y ) = = a f X f Y (y) log f Y (y) dy ( ) y b log a ( a f X ( )) y b dy a Changing variable to x = (y b)/a (and reversing limits if a < 0), ( ) h(ax + b) = f X (x) log a f X(x) dx ( ) = f X (x) log dx f X (x) log (f X (x)) dx a = h(x) + log a 2.2 Examples Example I: Differential entropy of a uniform random variable U U(a, b). The pdf of an uniform random variable is f X (x) = { b a a x b 0 otherwise The differential entropy is simply: h(x) = E log f X (X)] = log(b a) Notice that the differential entropy can be negative or positive depending on whether b a is less than or greater than. In practice, because of this property, differential entropy is usually used as means to determine mutual information and does not have much operational significance by itself. Example II: Differential entropy of a Gaussian random variable X N (0, σ 2 ). The pdf of a Gaussian random variable is f(x) = 2πσ 2 e 2σ 2 x2. The differential entropy is: h(x) = E log f(x)] 5

For simplicity, convert the base to e: h(x) = E ln f(x)] = E 2 πσ2 + = = = 2σ 2 X2 ]] 2 πσ2 + E 2σ 2 X2 2 πσ2 + ] 2σ 2 σ2 ] πeσ2 = log 2πeσ2 2 2 Per Exercise 3, differential entropy is invariant to constant shifts. Therefore this expression represents the differential entropy of all Gaussian random variables regardless of mean. Claim: The Gaussian distribution has maximal differential entropy, i.e., if X f X with second moment EX 2 ] σ 2 and G N (0, σ 2 ) then h(x) h(g). Equality holds if and only if X N (0, σ 2 ). Proof: 0 D(f X G) = E log f ] X(X) f G (X) ] = h(x) + E log f G (X) ] 2 X D(f X G) = h(x) + E log + 2σ 2 2πσ 2 Because the second moment of X is upper bounded by the second moment of G: ] 2 G 0 D(f X G) h(x) + E log + 2σ 2 2πσ 2 ] h(x) + E log = h(x) + h(g) f G (G) Rearranging: h(x) h(g) Equality holds when D(f X G) = 0, i.e., when X N (0, σ 2 ). Example III: Channel capacity of an Additive White Gaussian Noise channel (AWGN) that is restricted by power P The AWGN channel with parameter σ 2 has real input and output related as Y i = X i + W i, where W i s are iid N (0, σ 2 ) (and W i s are independent of X i s). Power constraint: n n i= X2 i P. The Channel Coding Theorem in this setting states that: C(P ) = ] max I(X; Y ) EX 2 ] P Where C(P ) represents the capacity ; the maximal rate of reliable communication when constrained to power P. This maximization problem will be addressed in the next lecture. 6