Massachusetts Institute of Technology. Problem Set 4 Solutions

Similar documents
DEPARTMENT OF EECS MASSACHUSETTS INSTITUTE OF TECHNOLOGY. 6.02: Digital Communication Systems, Fall Quiz I. October 11, 2012

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

LECTURE 13. Last time: Lecture outline

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

6.02 Fall 2012 Lecture #1

3F1 Information Theory, Lecture 3

Source Coding Techniques

X 1 : X Table 1: Y = X X 2

Lecture 1: Shannon s Theorem

Chapter 2: Source coding

BASIC COMPRESSION TECHNIQUES

Lecture 7 September 24

3F1 Information Theory, Lecture 3

18.310A Final exam practice questions

A Mathematical Theory of Communication

green green green/green green green yellow green/yellow green yellow green yellow/green green yellow yellow yellow/yellow yellow

An introduction to basic information theory. Hampus Wessman

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

CSE 123: Computer Networks

Image and Multidimensional Signal Processing

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

CMPT 365 Multimedia Systems. Lossless Compression

Unit 2 Lesson 4 - Heredity. 7 th Grade Cells and Heredity (Mod A) Unit 2 Lesson 4 - Heredity

HEREDITY: Objective: I can describe what heredity is because I can identify traits and characteristics

Introduction to Genetics

green green green/green green green yellow green/yellow green yellow green yellow/green green yellow yellow yellow/yellow yellow

7.014 Problem Set 6. Question 1. MIT Department of Biology Introductory Biology, Spring 2004

An overview of key ideas

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Directed Reading B. Section: Traits and Inheritance A GREAT IDEA

Simple Harmonic Motion Concept Questions

Multimedia Information Systems

for some error exponent E( R) as a function R,

For the case of S = 1/2 the eigenfunctions of the z component of S are φ 1/2 and φ 1/2

6.3 Bernoulli Trials Example Consider the following random experiments

Heredity and Genetics WKSH

16.36 Communication Systems Engineering

Error Detection and Correction: Hamming Code; Reed-Muller Code

10. How many chromosomes are in human gametes (reproductive cells)? 23

CMPT 365 Multimedia Systems. Final Review - 1

Massachusetts Institute of Technology

CSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77

Introduction to Genetics

UNIT I INFORMATION THEORY. I k log 2

Information Theory. Week 4 Compressing streams. Iain Murray,

6.02 Fall 2012 Lecture #10

Coding for Discrete Source

Genetics_2011.notebook. May 13, Aim: What is heredity? Homework. Rd pp p.270 # 2,3,4. Feb 8 11:46 PM. Mar 25 1:15 PM.

What kind of cell does it occur in? Produces diploid or haploid cells? How many cell divisions? Identical cells or different cells?

1. Let A and B be two events such that P(A)=0.6 and P(B)=0.6. Which of the following MUST be true?

Unit 3 - Molecular Biology & Genetics - Review Packet

Simple Harmonic Oscillator Challenge Problems

Problem 8.0 Make Your Own Exam Problem for Midterm II by April 13

6.01 Final Exam: Spring 2010

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Chapter 5. Heredity. Table of Contents. Section 1 Mendel and His Peas. Section 2 Traits and Inheritance. Section 3 Meiosis

Conditional Probability

Lecture 11: Quantum Information III - Source Coding

Basic information theory

Unit 3 Test 2 Study Guide

Ma/CS 6b Class 25: Error Correcting Codes 2

Constructing a Pedigree

Mathematics of the Information Age

7.014 Problem Set 6 Solutions

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Chapter 10 Beyond Mendel s Laws of Inheritance

Massachusetts Institute of Technology. Problem Set 6 Solutions

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

BIOLOGY. Monday 29 Feb 2016

CSCI 2570 Introduction to Nanocomputing

Name: Period: EOC Review Part F Outline

1-4 Special Products. Find each product. 1. (x + 5) 2 SOLUTION: 2. (11 a) 2 SOLUTION: 3. (2x + 7y) 2 SOLUTION: 4. (3m 4)(3m 4) SOLUTION:

EE5356 Digital Image Processing

1. What is genetics and who was Gregor Mendel? 2. How are traits passed from one generation to the next?

ECS 332: Principles of Communications 2012/1. HW 4 Due: Sep 7

2. Polynomials. 19 points. 3/3/3/3/3/4 Clearly indicate your correctly formatted answer: this is what is to be graded. No need to justify!

1. Basics of Information

HEREDITY AND VARIATION

Chapter 11 Meiosis and Genetics

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Note: Please use the actual date you accessed this material in your citation.

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 7 (FW) February 11, 2009 Phenotype and Genotype Reading: pp

log 2 N I m m log 2 N + 1 m.

Keys to Success on the Quarter 3 EXAM

THE RSA ENCRYPTION SCHEME

Multimedia Communications. Scalar Quantization

Genetic Engineering and Creative Design

DNA, Chromosomes, and Genes

2018/5/3. YU Xiangyu

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Lecture 4 : Adaptive source coding algorithms

8.01x Classical Mechanics, Fall 2016 Massachusetts Institute of Technology. Problem Set 8

Other Organisms (Part 3)

Stream Codes. 6.1 The guessing game

Transcription:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mechanical Engineering.050J/2.110J Information and Entropy Spring 2003 Problem Set 4 Solutions Solution to Problem 1: Meet the Box People Solution to Problem 1, part a. The probability that one of the box people s offspring has different phenotypes is as follows: i. An offspring has a circular phenotype unless s/he has genes cc. The probability of having cc is equal to the probability that each parent transmits gene c, squared (because they are independent events). Thus, this probability is 0.7 2 = 0.49 This means that the probability that the offspring has a circular shape is 1 0.49 = 0.51. ii. An offspring has a square phenotype when it inherits the recessive gene from both parents. Inheriting a recessive gene from one parent occurs with a probability of 0.7, thus the probability of inheriting a recessive gene from both parents is 0.7 2 = 0.49, as before. iii. Using the same reasoning, we get that the probability of a blue phenotype is 0.25. iv. Similarly, the probability of a red phenotype is 0.75. Solution to Problem 1, part b. Given that an offspring is circular, the probability of other phenotypes are as follows: i. Because the genes are independent, the probability of having a blue phenotype given a circular shape is equal to the probability of having a blue phenotype (gene rr), which is 0.25. ii. Similarly, the probability of having a red phenotype given a circular shape is 0.75. Solution to Problem 1, part c. Given that an offspring is red, the probability of other phenotypes are as follows: i. Again, the genes are independent, thus the probability of a box shape given a red color is 0.49. ii. The probability of a circular shape given a red color is 1 0.49 = 0.51. Solution to Problem 1, part d. We can draw a table of events as shown in Table 4 2. We see that the probability that a person has the disease given that the test is positive, is: 0.001 0.95 = 19.2% (4 2) 0.001 0.95 + 0.999 0.004 1

2 Have Disease? Percent Test Results Percent Total Yes 0.001 Positive 0.95 0.00095 Negative 0.05 0.00005 No 0.999 Positive 0.004 0.00399 Negative 0.99 0.95504 Table 4 2: Triangularity Test Results Solution to Problem 2: Huffman Coding Solution to Problem 2, part a. To encode fourteen symbols we would need four bits, for 2 4 = 1 different codewords. This gives 4 44 = 17 bits. Solution to Problem 2, part b. Table 4 3 lists the calculation of the average information per symbol. Here we calculate an average of 3.33 bits per symbol, or 147 bits. ( ) ( ) 1 1 Character Frequency log 2 p i p i log 2 p i p 20.4% 2.29 0.4 e 18.18% 2.45 0.44 space 15.91% 2.5 0.42 c.82% 3.87 0.2 i.82% 3.87 0.2 k.82% 3.87 0.2 r.82% 3.87 0.2 d 4.55% 4.45 0.20 a 2.27% 5.4 0. f 2.27% 5.4 0. l 2.27% 5.4 0. o 2.27% 5.4 0. s 2.27% 5.4 0. t 2.27% 5.4 0. Total 100.00 3.33 Table 4 3: Frequency distribution of characters in peter piper picked a peck of pickled peppers Solution to Problem 2, part c. See Table 4 3. Solution to Problem 2, part d. A possible code is derived below and listed in Table 4 4. Start: (p= NA p = 0.204) (e= NA p = 0.1818) (space= NA p = 0.1591) (c= NA p =.082) (i= NA p =.082) (k= NA p =.082) (r= NA p =.082) (d= NA p =.0455) (a= NA p =.0227) (f= NA p =.0227) (l= NA p =.0227) (o= NA p =.0227) (s= NA p =.0227) (t= NA p =.0227)

3 p =.082) (k= NA p =.082) (r= NA p =.082) (d= NA p =.0455) (a= NA p =.0227) (f= NA p =.0227) (l= NA p =.0227) (o= NA p =.0227) (s= 0, t= 1 p =.0454) p =.082) (k= NA p =.082) (r= NA p =.082) (d= NA p =.0455) (a= NA p =.0227) (f= NA p =.0227) (l= 0, o= 1 p =.0454) (s= 0, t= 1 p =.0454) p =.082) (k= NA p =.082) (r= NA p =.082) (d= NA p =.0455) (a= 0, f= 1 p =.0454) (l= 0, o= 1 p =.0454) (s= 0, t= 1 p =.0454) p =.082) (k= NA p =.082) (r= NA p =.082) (d= NA p =.0455) (a= 0, f= 1 p =.0454) (l= 00, o= 01, s= 10, t= 11 p =.0908) p =.082) (k= NA p =.082) (r= NA p =.082) (d= 0, a= 10, f= 11 p =.0909) (l= 00, o= 01, s= 10, t= 11 p =.0908) p =.082) (k= 0,r= 1 p =.134) (d= 0, a= 10, f= 11 p =.0909) (l= 00, o= 01, s= 10, t= 11 p =.0908) Next: (p= NA p = 0.204) (e= NA p = 0.1818) (space= NA p = 0.1591) (c= 0, i= 1 p =.134) (k= 0, r= 1 p =.134) (d= 0, a= 10, f= 11 p =.0909) (l= 00, o= 01, s= 10, t= 11 p =.0908) Next: (p= NA p = 0.204) (e= NA p = 0.1818) (space= NA p = 0.1591) (c= 0, i= 1 p =.134) (k= 0, r= 1 p =.134) (d= 00, a= 010, f= 011, l= 100, o= 101, s= 110, t= 111 p =.1817) Next: (p= NA p = 0.204) (e= NA p = 0.1818) (space= NA p = 0.1591) (c= 00, i= 01, k= 10, r= 11 p =.2728) (d= 00, a= 010, f= 011, l= 100, o= 101, s= 110, t= 111 p =.1817) Next: (p= NA p = 0.204) (e= NA p = 0.1818) (c= 00, i= 01, k= 10, r= 11 p =.2728) (space= 0, d= 100, a= 1010, f= 1011, l= 1100, o= 1101, s= 1110, t= 1111 p =.3408) Next: (p= 0, e= 1 p = 0.384) (c= 00, i= 01, k= 10, r= 11 p =.2728) (space= 0, d= 100, a= 1010, f= 1011, l= 1100, o= 1101, s= 1110, t= 1111 p =.3408) Next: (p= 0, e= 1 p = 0.384) (c= 000, i= 001, k= 010, r= 011, space= 10, d= 1100, a= 11010, f= 11011, l= 11100, o= 11101, s= 11110, t= 11111 p =.13) Final: (p= 00, e= 01, c= 1000, i= 1001, k= 1010, r= 1011, space= 110, d= 11100, a= 111010, f= 111011, l= 111100, o= 111101, s= 111110, t= 111111 p = 1.0000) Solution to Problem 2, part e. When the sequence is encoded using the codebook derived in part d... i. See Table 4 5. ii. The fixed length code requires 17 bits, whereas Huffman coding requires 149 bits. So we find that the Huffman code does a better job than the fixed length code.

4 Character Code p 00 e 01 space 110 c 1000 i 1001 k 1010 r 1011 d 11100 a 111010 f 111011 l 111100 o 111101 s 111110 t 111111 Table 4 4: Huffman code for peter piper picked a peck of pickled peppers Character # of Characters Bits per Character Bits Needed p 9 2 18 e 8 2 1 space 7 3 21 c 3 4 i 3 4 k 3 4 r 3 4 d 2 5 10 a 1 f 1 l 1 o 1 s 1 t 1 Total 44 149 Table 4 5: Huffman code for peter piper picked a peck of pickled peppers iii. This number compares extremely well with the information content of 147 bits for the message as a whole. Solution to Problem 2, part f. The original message is 44 bytes long, and with LZW we know from Problem Set 2 we can encode the message using LZW in 32 bytes, with 31 characters in the dictionary. Thus we need 32+14=4 different dictionary entries, for a total of six bits per byte. Thus we can compact the message down to 32 = 192 characters. Straight encoding needs 17 bits, and Huffman encoding needs 149 bits. Thus Huffman encoding does the best job of compacting the material. A lower bound on sending the Huffman codebook is the number of bits in the code, total. This is equal to 2 + 2 + 3 + 4 + 4 + 4 + 4 + 5 + + + + + + = 4 bits. If we imagine that we need to send some control bits along, perhaps it is something like five bits between each code (a reasonable estimate), this is an additional 5 (14 + 1) = 75 bits. So we have an lower bound estimate of 139 bits.

5 Thus a fixed length code requires 17 bits, LZW needs 192 bits, and Huffman coding with the transmission of the codebook requires an estimated 29 bits.

MIT OpenCourseWare http://ocw.mit.edu.050j / 2.110J Information and Entropy Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.