3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the following table: x, y P XY x, y,.,.3,.,.4 Express the following quantities in terms of the binary entropy function hx = x log x x log x. a HX We marginalise P X = P XY, + P XY, = / and hence HX = h/ = [bits]. b HY Similarly, P Y = P XY, + P XY, =.3 and hence HY = h.3. c HX Y We compute and and hence P X Y = P XY, P Y P X Y = P XY, P Y =..3 = 3 =.3.7 = 3 7 HX Y = P Y HX Y = + P Y HX Y = =.3h/3 +.7h3/7.

d HY X We compute and and hence P Y X = P XY, P X P Y X = P XY, P X =..5 = 5 =..5 = 5 HY X = P X HY X = + P X HY X = =.5h/5 +.5h/5. e HXY Using the chain rule, we can compute either or HXY = HX + HY X = +.5h/5 +.5h/5 =.8464 [bits] HXY = HY + HX Y = h.3 +.3h/3 +.7h3/7 =..8464 [bits] which, as can be observed, yields the same result. f IX; Y We can write either or IX; Y = HX HX Y =.3h/3.7h3/7 =.349 [bits] IX; Y = HY HY X = h.3.5h/5.5h/5 =.349 [bits] which, as can again be observed, yields the same result.. Let an N-ary random variable X be distributed as follows: { P X = p P X k = p for k =, 3,..., N. N Express the entropy of X in terms of the binary entropy function h.. p HX = p log p N N log p N p = p log p p log N = p log p p log p + p logn = hp + p logn.

3. A discrete memoryless source has an alphabet of eight letters, x i, i =,,, 8 with probabilities.5,.,.5,.,.,.8,.5 and.5. a Use the Huffman encoding to determine a binary code for the source output. The diagram below shows a the working to produce the Huffman code. The order in which the probabilities are merged is shown in brackets above the probabilities. 6 7.5 -------------------------------------------.58 ---. 5 / /. -----------------.4 -------------------------/ / 4 /.5 ------------------------------.33 --/ 3 / /. -----. --/ / / /. --/ / /.8 ----------------.8 --/ /.5 -----. --/ /.5 --/ Labelling the upper branch and the lower branch gives the code: Symbol Code x x x 3 x 4 x 5 x 6 x 7 x 8 Note that other labellings are possible and also there is a choice at the second merge since there are two probabilities of. at that time. b Determine the average codeword length L Average codeword length: L =..5 +.. + 3..5 + 3.. + 3.. + 4..8 + 5..5 + 5..5 =.83 bits. 3

c Determine the entropy of the source and hence its efficiency HS =.5. ln.5 +.. ln. +.5. ln.5 +.. ln. +.. ln. +.5. ln.5 +.5. ln.5 =.798 bits. The average codeword length is greater than the entropy as expected. Efficiency =.798 /.83 = 98.9%. 4. Show that for statistically independent events HX, X,, X n = n HX i i= First expand out HX, X,, X n HX, X,, X n = N N i = i = N n i n= The probabilities are independent so this is = N N i = i = N n i n= P x i, x i, x in log P x i, x i, x in n n P x ij log P x ij The last summation only indexes i n so we can move all the other terms in the first product to the left of it. We can also replace the log of a product with the sum of the logs. = N N i = i = N n i n = n P x ij Nn n P x in log P x ij i n= All terms in the right hand sum except the last one do not depend on i n and since Nn i n= P x i n = we can transform N n i n= n n P x in log P x ij = log P x ij + Now the right hand term is independent of i i n and N N i = i = N n i n = n P x ij = N n i n= P x in log P x in 4

so we have HX, X,, X n = N n i n= P x in log P x in + N N i = i = = HX n + HX, X,, X n by induction this gives N n i n = n P x ij n log P x ij = n HX i i= 5. A five-level non-uniform quantizer for a zero-mean signal results in the 5 levels b, a,, a, b with corresponding probabilities of occurrence p b = p b =.5, p a = p a =. and p =.7. a Design a Huffman code that encodes one signal sample at a time and determine the average bit rate per sample..7 ------------------------\ \ -a. --\ --. -. --\ / a. --/ \ / -.3 --/ -b.5 --\ / -. --/ b.5 --/ Symbol Code giving code: a a b b Average bit rate =..7 + 3.. +. +.5 +.5 =.6 bits. Another version of the code would give bits to a or to a and 4 bits each to b and b. It has the same mean length. b Design a Huffman code that encodes two output samples at a time and determine the average bit rate per sample. We combine the 5 states into 5 pairs of states, in descending order of probability and form a Huffman code tree as follows. 5

Note that there are many different possible versions of this tree as many of the nodes have equal probabilities and so can be taken in arbitrary order. However the mean length and hence performance of the code should be the same in all cases. The code lengths are shown on the left, but should be filled in last, according to the number of branch-points in the tree that are passed, going from the root to each code state. len samp prob.49 ---------------------------------------------------------\ 4 -a.7 --\ \ 4 a.7 -----------.4 -----\ \ 4 -a.7 --\ -----.8 -----------------\ \ 4 a.7 -----------.4 -----/ \ \ 5 -b.35 ---.7 -\ \ \ Root 5 b.35 -/ ---.4 -----------------------\ \. 5 -b.35 ---.7 -/ \ \ / 5 b.35 -/ \.5 / 6 -a-a. ---------\ \ / 7 a-a. ---. ----.3 -----\.3 / 7 -a a. -/ ---.5 -\ / 6 a a. ------------. -----/ \ / 7 -a-b.5 ---. -/ \ / 7 a-b.5 -/.9 / 7 -a b.5 ---. ---------\ / 7 a b.5 -/. \ / 7 -b-a.5 ---. ---------/ \ / 7 b-a.5 -/.4 / 7 -b a.5 ---. ---------\ / 7 b a.5 -/. / 8 -b-b.5 --.5 \ / 8 b-b.5 /. / 8 -b b.5 --.5 / 8 b b.5 / Summing probabilities for codewords of the same length: Av. codeword length =..49 + 4..8 + 5..4 + 6.. + 7..6 + 8.. =.93 bits per pair of samples. Hence code rate =.93/ =.465 bit / sample c What are the efficiencies of these two codes? Entropy of the message source =.7 log.7+.. log.+..5 log.5 =.4568 bit / sample Hence: Efficiency of code =.4568.6 = 9.5% Efficiency of code =.4568.465 = 99.44% Code represents a significant improvement, because it eliminates the zero state of code which has a probability well above.5. 6. While we cover in 3F and 4F5 the application of Shannon s theory to data compression and transmission, Shannon also applied the concepts 6

of entropy and mutual information to the study of secrecy systems. The figure below shows a cryptographic scenario where Alice wants to transmit a secret plaintext message X to Bob and they share a secret key Z, while the enemy Eve has access to the public message Y. Message Source Key Source X Alice Encrypter Z Eve Enemy Cryptanalyst Y Bob Decrypter Z X Message Destination a Write out two conditions using conditional entropies involving X,Y and Z to enforce the deterministic encryptability and decryptability of the messages. Hint: the entropy of a function given its argument is zero, e.g., for any random variable X, HfX X =. The conditions are HY XZ = because the ciphertext is a function of the secret plaintext message and the secret key, and HX Y Z = because the secret plaintext message can be inferred from the ciphertext and the key. b Shannon made the notion of an unbreakable cryptosystem precise by saying that a cryptosystem provides perfect secrecy if the enemy s observation is statistically independent of the plaintext, i.e., IX; Y =. Show that this implies Shannon s much cited bound on key size HZ HX, i.e., perfect secrecy can only be attained if the entropy of the key and hence its compressed length is at least as large as the entropy of the secret plaintext. 7

Since IX; Y =, we have HX = HX Y = HX, Z Y HZ X, Y HX, Z Y = HZ Y + HX Z, Y = HZ Y HZ c Vernam s cipher assumes a binary secret plaintext message X with any probability distribution P X = p = P X and a binary secret key Z that s uniform P Z = P Z = / and independent of X. The encrypter simply adds the secret key to the plaintext modulo, and the decrypter by adding the same key to the ciphertext can recover the plaintext. Show that Vernam s cipher achieves perfect secrecy, i.e., IX; Y =. We have P Y = P XY Z x,, z x z = P Y XZ x, zp X xp Z z x z and note that P Y XZ x, z = if x + z mod =, and P Y XZ x, z = otherwise, and so the expression above becomes P Y = p + p = and hence Furthermore, HY = [bits]. P Y X = P XY, P X = P XY Z,, + P XY Z,, P X = P XP Z P Y XZ, + P X P Z P Y XZ, P X = p + = p 8

and similarly P Y X = P XY, P X = P XY Z,, + P XY Z,, P X = P XP Z P Y XZ, + P X P Z P Y XZ, P X and hence, = + p p = HY X = P X HY X = +P X HY X = = ph/+ ph/ = [bits] and so, finally, IX; Y = HY HY X = = [bits]. 7. What is the entropy of the following continuous probability density functions? x < a P x =.5 < x < x > HX =.5 log.5 dx = log.5 b P x = λ e λ x HX = = = = λ lnλ/ = lnλ/ = log 4 = λ λ e λ x ln λ λ e λx ln λ e λx + λ 9 ln e λ x e λx λ e λx dx + dx dx λx dx λ λxe λx dx xe λx dx

integrating by parts with u = x and v = λe λx so v = e λx : = lnλ/ + λ [ xe ] λx e λx dx c P x = σ π e x /σ HX = = = lnλ/ = lnλ/ = lnλ/ = lnσ π = lnσ π σ π σ π σ π + λ + e x /σ e x /σ + σ π + λ ln ln /σ e x dx + σ /σ e x π x σ π dx σ σ π x x σ e x /σ dx x /σ e x dx σ integrating by parts with u = x and v = x σ e x /σ giving v = e x /σ : = lnσ π = lnσ π = lnσ π [ + σ xe x /σ ] π + σ + σ π π + = log σ π + log e = log σ πe e x /σ dx 8. Continuous variables X and Y are normally distributed with standard deviation σ =. P x = π e x / P y = π e y / A variable Z is defined by z = x + y. What is the mutual information of X and Z?

IZ; X = HZ HZ X Z is a normally distributed variable with standard deviation σ = since it is the sum of two independent variables with standard deviation σ. Hence by 6c above. HZ = log. π + log e If X is known then Z is still normally distributed, but now the mean = x and the standard deviation is since z = x + y and Y has zero mean and standard deviation σ =. So: HZ X = log π + log e So IZ; X = log. π log π. π = log π = log =.5 bit 9. A symmetric binary communications channel operates with signalling levels of ± volts at the detector in the receiver, and the rms noise level at the detector is.5 volts. The binary symbol rate is kbit/s. a Determine the probability of error on this channel and hence, based on mutual information, calculate the theoretical capacity of this channel for error-free communication. Using notation and results from sections. and.4 of the 3F Random Processes course and section 3.5 of the 3F Information Theory course: Probability of bit error, p e = Q v = Q σ.5 = Q4 = 3.7. 5 The mutual information between the channel input X and the channel output Y is then: IY ; X = HY HY X = HY H[p e p e ] Now, assuming equiprobable random bits at input and output of the channel: HY = [.5 log.5 +.5 log.5] = bit / sym

and, from the above p e : H[p e p e ] = [3.7. 5 log 3.7. 5 + 3.7. 5 log 3.7. 5 ] = 4.739. 4 + 4.57. 5 = 5.96. 4 bit / sym So IY ; X = 5.96. 4 =.99948 bit / sym Hence the error-free capacity of the binary channel is Symbol rate IY ; X = k..99948 = 99.948 kbit/s. b If the binary signalling were replaced by symbols drawn from a continuous process with a Gaussian normal pdf with zero mean and the same mean power at the detector, determine the theoretical capacity of this new channel, assuming the symbol rate remains at ksym/s and the noise level is unchanged. For symbols with a Gaussian pdf, section 4.4 of the course shows that the mutual information of the channel is now given by: IY ; X = hy hn = log σ Y σ N = log + σ X σn since the noise is uncorrelated with the signal, so σ Y = σ X + σ N. For the given channel, σ X /σ N = /.5 = 4. Hence IY ; X = log + 4 =.437 bit / sym and so the theoretical capacity of the Gaussian channel is C G = k..437 = 4.37 kbit/s over twice that of the binary channel c The three capacities are plotted on the next page. There is no loss at low SNR for using binary signalling as long as the output remains continuous. As the SNR increases, all binary signalling methods hit a bit/use ceiling whereas the capacity for continuous signalling continues to grow unbounded.

3.5 continuous input, continuous output binary input, binary output binary input, continuous output Capacity [bits/use].5.5 5 5 5 SNR [db] 3