Massachusetts Institute of Technology

Similar documents
Massachusetts Institute of Technology. Solution to Problem 1: The Registrar s Worst Nightmare

Massachusetts Institute of Technology

Communication Theory II

Revision of Lecture 5

Chapter 9 Fundamental Limits in Information Theory

Principles of Communications

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

CS 630 Basic Probability and Information Theory. Tim Campbell

ITCT Lecture IV.3: Markov Processes and Sources with Memory

16.36 Communication Systems Engineering

Lecture 8: Shannon s Noise Models

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory


6.02 Fall 2012 Lecture #1

Lecture 2: August 31

6.02 Fall 2011 Lecture #9

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

An introduction to basic information theory. Hampus Wessman

Entropies & Information Theory

CSE468 Information Conflict

Information Theory and Coding Techniques

Noisy channel communication

9 Forward-backward algorithm, sum-product on factor graphs

Shannon s noisy-channel theorem

Computing and Communications 2. Information Theory -Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Lecture 11: Continuous-valued signals and differential entropy

Upper Bounds on the Capacity of Binary Intermittent Communication

Revision of Lecture 4

Lecture 4 Channel Coding

Information Theory - Entropy. Figure 3

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Exercise 1. = P(y a 1)P(a 1 )

Compression and Coding

Introduction to Information Theory. Part 4

Lecture 22: Final Review

Lecture 1. Introduction

CSCI 2570 Introduction to Nanocomputing

The Continuing Miracle of Information Storage Technology Paul H. Siegel Director, CMRR University of California, San Diego

Lecture 4 Noisy Channel Coding

log 2 N I m m log 2 N + 1 m.

X 1 : X Table 1: Y = X X 2

(Classical) Information Theory III: Noisy channel coding

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Channel Coding: Zero-error case

Lecture 11: Polar codes construction

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.

Lecture 1: Introduction, Entropy and ML estimation

One Lesson of Information Theory

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

ECE Information theory Final (Fall 2008)

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

Noisy-Channel Coding

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

ECE Information theory Final

to mere bit flips) may affect the transmission.

Chapter I: Fundamental Information Theory

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

INTRODUCTION TO INFORMATION THEORY

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Information in Biology

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Exercises with solutions (Set B)

Classification & Information Theory Lecture #8

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

4 An Introduction to Channel Coding and Decoding over BSC

Multiple Description Coding for quincunx images.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Network coding for multicast relation to compression and generalization of Slepian-Wolf

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

EE5585 Data Compression May 2, Lecture 27

LECTURE 13. Last time: Lecture outline

Shannon s Noisy-Channel Coding Theorem

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Introduction to Information Theory

Delay, feedback, and the price of ignorance

Quiz 2 Date: Monday, November 21, 2016

Quantitative Information Flow. Lecture 7

The Adaptive Classical Capacity of a Quantum Channel,

3F1 Information Theory, Lecture 1

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

ELEC546 Review of Information Theory

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Massachusetts Institute of Technology

Information in Biology

1.6: Solutions 17. Solution to exercise 1.6 (p.13).

Channel Coding and Interleaving

Lecture 12. Block Diagram

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN

Problem Set: TT Quantum Information

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Transcription:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mechanical Engineering 6.050J/2.0J Information and Entropy Spring 2005 Issued: March 7, 2005 Problem Set 6 Solutions Due: March, 2005 Solution to Problem : When Hell Freezes Over.. Solution to Problem, part a. The probability of MIT closing on any given day, can be found as follows: P (C F ) P (F ) P (C) = (6 ) P (F C) 0.5 0.0 = (6 2) = 0.005 (6 3) Solution to Problem, part b. The probability of MIT closing and Hell freezing over on the same day. OR P (C, F ) = P (C F ) P (F ) (6 4) = 0.5 0.0 (6 5) = 0.005 (6 6) P (C, F ) = P (F C) P (C) (6 7) = 0.005 (6 8) = 0.005 (6 9) (Note that there are multiple ways to represent the joint probability symbolically. Acceptable forms are P (C, F ), P (C&F ), and P (CF ). The order of C and F is unimportant.) Solution to Problem, part c. Hell is expected to freeze over: 365 P (F ) = 3.65 days a year. Solution to Problem, part d. MIT is expected to close: 365 P (C) = 0.5475 days a year.

Problem Set 6 Solutions 2 Solution to Problem, part e. In general to answer this question we would need to compute P (C, 2006 C, 2005). That can be readily done using Bayes rule: P (C, 2006 C, 2005) = P (C, 2006, C, 2005)/P (C, 2005). Unfortunately we do not have a model of the joint probability for MIT closing in two different years. In these conditions a common assumption is that both events are independent (much like coin tosses are), in which case the answer is simply: and we can safely take P (C, 2006) = P (C, 2005). P (C, 2006 C, 2005) = P (C, 2006, C, 2005)/P (C, 2005) (6 0) = P (C, 2006)P (C, 2005)/P (C, 2005) (6 ) = P (C, 2006) (6 2) (6 3) Solution to Problem 2: Communicate Solution to Problem 2, part a. Since this is a symmetric binary channel (equal probability of a switching to a 0 as does a 0 switch to a ), we can use the following derivative of Shannon s channel capacity formula. H(x) is defined to be the number of bits of information in x. Note that x is not an argument of H. You can read that notation as The entropy of x or The amount of unknown information in x. H = n p i log 2 p i (6 4) i= H(transmitted) = H(input) H(loss) W = Channel Bandwidth = (Max Channel capacity with errors) C = Max Channel capacity (without errors) = W H(transmitted) In these equations the loss term is the information lost when the physical noise changes the bits within the channel. We assume that the input bitstream has equal probability of s as 0 s, so H(input) =. We assume this because otherwise we wouldn t be utilizing all of the channel bandwidth. So, our next task is to determine the number of bits of information lost because of errors in the channel. This is the information we do not have about the input even if we know the output. H(loss) = 0.25 log 2 (0.25) 0.75 log 2 (0.75) (6 5) = 0.5 + 0.33 (6 6) H(loss) = 0.83 (6 7) (6 8) H(loss) is the number of bits that are distorted per input bit. The difference with th einitial entropy or uncertainty is the amount of information that makes it to the output uncorrupted: the information transmitted. Applying these results yields the following. H(transmitted) = 0.83 = 0.887 (6 9) W = 000bits/sec (6 20) C = 000 0.887 (6 2) = 88.7bits/sec (6 22)

Problem Set 6 Solutions 3 Solution to Problem 2, part b. Since triple redundancy is being used, we ll encapsulate the source encoder to form a new channel as in the following diagram. New Channel Source Source Encoder (Triple Redundancy) Channel Source Decoder (Majority Vote) Destination 333.3 bits/second 000 bits/second Noise Figure 6 : New channel model after addition of triple redundancy Since the original channel has a bandwidth of 000 bit/sec and we are now using triple redundancy, our new maximum channel capacity that we can feed into our new channel is 333 /3 bits/sec. Solution to Problem 2, part c. Also, our channel characteristics change. No longer is the probability of a bit error 25%. Using Bayes theorem and the binomial distribution to establish theprobability of having a given number of errors, we can obtain the new probability of a bit error. Let A be the event that redundancy fails, and let us call ε the original error rate. P (A, 0 errors) = P (A 0 errors) P (0 errors) = 0 (6 23) P (A, error) = P (A error) P ( error) = 0 (6 24) P (A, 2 errors) = P (A 2 errors) P (2 errors) (6 25) 3 = ε 2 ( ε) (6 26) 2 = 3 (0.25) 2 ( 0.25)) (6 27) P (A, 3 errors) = P (A 3 errors) P (3 errors) = (6 28) 3 = ε 3 (6 29) 3 = (0.25) 3 (6 30) P (A) = P (A, 0 errors) + P (A, error) + P (A, 2 errors) + P (A, 3 errors) (6 3) P (A) = 0.406 + 0.056 (6 32) P (A) = 0.562 (6 33)

Problem Set 6 Solutions 4 Solution to Problem 2, part d. Continuing on with our calculations to determine the entropy of the loss and the resulting maximum channel capacity without errors yields the following. H(loss) = 0.562 log 2 (0.562) 0.8438 log 2 (0.8438) (6 34) H(loss) = 0.484 + 0.2068 (6 35) H(loss) = 0.625 (6 36) (6 37) H(transmitted) = 0.625 = 0.3749 (6 38) W = 333.333bits/sec (6 39) C = 333.333 0.3749 = 24.95bits/sec (6 40) This new maximum channel capacity is less than that of the original channel. From this result, we can see the tradeoff for decreased bit error probability for bandwidth and the resulting maximum channel capacity. Solution to Problem 2, part e. Our probability of bit error in this new channel is above the 0% required for the application being designed for. However, the probability of error is indeed less than what it was before. The price we pay is a lower bitrate. Solution to Problem 3: Nerds Nest Expands Operations The first thing to do in this problem set is to identify what type of communication does the extravagant set-up correspond to. Figure 6 2, shows how to do that. Input Channel Output F P(F)=.8 0.5 0.5 H R P(R)=.2 L Figure 6 2: Scheme for Nerd s Nest entree communication system, transition probabilities are expressed on top of the lines connecting input with output. The conclusion is that we are dealing with an instance of an asymmetric binary communication channel. We can just ignore the rest of the detail and characterize this communication system with section 6.6 of the notes ( Noisy Channel ) as reference.

Problem Set 6 Solutions 5 Solution to Problem 3, part a. The first difference with the course notes is that our communication channel, is not symmetric, therefore, we can expect to have a different transition matrix: [ ] [ ] chf c HR.5 0 = (6 4) c LF c LR.5 Solution to Problem 3, part b. The probabilities of the outcomes can be obtained by Bayes rule, and summing over all possible inputs: = P (H F )P (F ) + P (H R)P (R) = c HF P (F ) + c HR P (R) =.5.8 + 0.2 =.4 (6 42) = P (L F )P (F ) + P (L R)P (R) = c LF P (F ) + c LR P (R) =.5.8 +.2 =.6 (6 43) Solution to Problem 3, part c. In order to assess the uncertainty before transmission occurs, we need to look at the input probabilities, that were given in the problem statement: P (R) =.2 = P (F ). Then according to the lecture notes the uncertainty before transmission is ( also known as entropy and often referred to with the letter H): U before = P (R) log. 2 + P (F ) log P (R) 2 (6 44) P (F ) Solution to Problem 3, part d. =.4644 +.2575 (6 45) U before =.729 (6 46) In the absence of error, the capacity of a communicaton channel is expressed as: C = W log 2 n (6 47) where n is number of possible input states, and W is the transmission rate (in bits per second). In the case of a binary channel n = 2, and the problem statement tells us that W = orders/second. So the ideal (noise-free) capacity of the system is: C = bit/second. You may wonder why we did not use the input probabilities; if so, remember that Capacity is a property of the channel. You could have taken the capacity to be defined as C = W M max, as it is for the noisy case, even with this definition you must find the same result because M max = in the symmetric binary channel. Solution to Problem 3, part e. Back to the noisy case, we want to compute the mutual information between input and output. In the solution to previous problems we have referred to it as H(transmitted). It is the same concept. The lecture notes define this quantity in equations (6.25 & 6.26). The easiest here is to take the second definition in equation (6.26) because we have already computed all the relevant intermediate quantities. M = P (B j ) log 2 P (A i ) P (B j A i ) log B 2 (6 48) j j P (B i j j A i )

Problem Set 6 Solutions 6 adapting it to our problem = log 2 + log 2 [ ( P (R) c LR log 2 + c HR log 2 P (F ) c LR [ c LF log 2 ( c LF c HR ) + c HF log 2 ( c HF )] )] (6 49) (6 50) (6 5) (6 52) and replacing the values of the transition matrix = log 2 + log 2 (6 53) P (R) [ log 2 + 0 log 2 0] (6 54) [ P (F ) 2 log 2 2 + ] 2 log 2 2 (6 55) which simplifies to = log 2 + log 2 P (F ) (6 56) replacing the values of,, and P (F ) previously computed =.6 log 2 (.6 ) +.4 log 2.8 (6 57).4 =.97.8 (6 58) =.7bits. (6 59) Solution to Problem 3, part f. The maximum mutual information over all the possible input distributions gives us a quantity that depends only on the channel (as it should be if we are interested in computing the channel capacity); it tells us the maximum amount of information that could be transmitted through this channel (A good way to understand this is looking at the information flow diagrams introduced in chapter 7). We can obtain M max either graphically or differentiating M with respect to th einput probability distribution. Recall the last step in our derivation of the mutual information: M = log 2 ( ) + log 2 ( ) P (F ) (6 60) We can express, in terms of P (F ), we already did so in part b. Let us rename p = P (F ), and express equation (6 60) in terms of p. After some manipulations we will reach the following expression 2 M = log 2 + p 2 p 2 p 2 log 2 p. (6 6) p

Problem Set 6 Solutions 7 We just have to take a derivative of M with respect to p and equal the result to zero. Doing so will yield the following equation: dm dp = 2 p 2 log 2 = 0 (6 62) p After isolating p you should obtain that the maximum of M occurs at p = P (F ) =.4. Note that after introducing asymmetric noise, the maximum mutual information is no longer centered at 0.5 as it was for the binary noiseless channel. Figure 6 3 illustrates this effect. Mutual Information as a function of the input probabilities M 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. M max Noiseless Binary channel Noisy Binary Channel 0 0 0.2 0.4 0.6 0.8 p=p(f) Figure 6 3: Comparison of the mutual information for noise-free and noisy channels This is due to a competing effect between maximizing the amount of information transmitted and minimizing the errors introduced by the channel: since errors are no longer symmetric, minimizing the errors amounts to breaking the balance between symbols, but on the other hand breaking this balance tends to decrease the entropy before transmission. This tradeoff is what makes M(p) have a maximum. From the graph, or, replacing p =.4 in equation (6 6), we get M max =.329. Solution to Problem 3, part g. To compute the channel capacity now, we just need to plug the value of M max in the formula for the capacity of a noisy channel given in the notes. C = W M max =.329bits/second. So the manager has to be concerned. The noisy channel can only transmit.329 bits/second of information, and information gets lost in the way. Note that if the system had been error-free there would have been no problem in keeping up with incoming customers.