An introduction to basic information theory. Hampus Wessman

Size: px
Start display at page:

Download "An introduction to basic information theory. Hampus Wessman"

Transcription

1 An introduction to basic information theory Hampus Wessman

2 Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on how efficiently information can be reliably communicated, with and without noise, are presented with proofs. All the required background is also discussed.

3 Contents 1 Introduction 4 2 Information and entropy Information sources What is entropy? A measure of information Noiseless communication Introduction Instantaneous codes The noiseless coding theorem Noisy communication Noisy channels Channel capacity Decoding The noisy-channel coding theorem Appendix 15

4 1 Introduction Information theory is a branch of mathematics, which has its roots in the groundbreaking paper [3], published by Claude Shannon in Many of the basic results were already introduced by Shannon back then. Some proofs are quite sketchy in Shannon s paper, but more elaborate proofs have been published afterwards. There are a few introductory books on the subject (e.g. [1]). We seek here to present some of the most basic results of information theory, in as simple a way as possible. Both Shannon s classic paper and many later books on the subject, present the theory in a quite general way. This can be very useful, but it also makes the proofs and presentation longer and harder to comprehend. By restricting ourselves to the simplest possible cases (that are still interesting, of course) and constructing short and simple proofs, we hope to give a shorter and clearer introduction to the subject. The basic results will still be similar and further details can easily be found elsewhere. This text is loosely based on [1] and [3]. Most of the theorems, definitions and ideas are variations of things that are presented in these two works. The basic proof ideas are often similar too, but the actual proofs are quite different. Whenever possible, the proofs use less general and more compact arguments. Some new concepts are also introduced, to simplify the presentation. The basic theory discussed here is fairly well-known by now. Most of it originates from [3]. The text also makes use of some other well-known terminology and results from both basic mathematics and computer science. 1 We don t include references in the text for this, but that does not mean that we claim it to be original. The proof of theorem 7 is based on the classic approach used by Shannon in [3]. Shannon s proof is quite sketchy and a bit more general, though, so this proof is a more detailed variant. The introduction of definition 10 simplifies the presentation somewhat. The rest of the theory is fairly standard and not very complicated. Instantaneous codes are sometimes called prefix codes and is a well-known concept (see e.g. [1] and [2]). In section 2, we introduce the fundamental concepts of information and entropies. Section 3 then discusses how efficiently information can be sent over noiseless channels and section 4 discusses the same thing when the channel is noisy. The main results can be found at the end of section 3 and 4. 2 Information and entropy 2.1 Information sources We begin by discussing what we mean with information. The fundamental problem that we are interested in is how to communicate a message that is selected from a set of possible messages. If there is only one possible message, then the problem is trivial, because we always know what message will be or was chosen. A message like that doesn t convey any information. A more interesting information source would be one that chooses messages from a set of several possible messages. One way to generate such messages in real life would be to repeatedly throw a die and choose the number you get 1 We try to explain everything that is not entirely obvious. 4

5 as your message. In this case the message that you threw a 2 does convey some information. We will soon define a measure of how much information is produced by a simple information source like this. For the rest of this text, the message chosen by an information source will simply be a random variable X that takes on one of the values x 1, x 2,..., x n with probabilities p 1, p 2,..., p n respectively. This simple model is sufficient for our purposes. Sometimes, we will make several independent observations of the random variable. 2.2 What is entropy? If someone throws a fair coin without revealing the result, then you can t be sure what the result was. There is a certain amount of uncertainty involved. Lets say that the same person throws a two-headed coin instead. This time, you can be completely sure what the result was. In other words, the result is not uncertain at all this time. Entropy is a measure of this uncertainty. Its usefulness will be seen later. Right now, we simply give the definition. Definition 1. The entropy of the probabilities p 1, p 2,..., p n is H(X) = H(p 1, p 2,..., p n ) = p i log 2 p i. Note that we use the binary logarithm here. Entropy is measured in the unit of bits. A few simple examples follow. The entropy of throwing a fair coin is exactly 1.0 bit. Throwing a two-headed coin gives 0.0 bits of entropy. The entropy of throwing a six-sided die is approximately 2.58 bits. We will also have use for conditional entropies. Lets say that we have two random variables X and Y and that we know that Y = y, then we define the conditional entropy of X given that Y = y as Definition 2. H(X Y = y) = P (X = x i Y = y) log 2 P (X = x i Y = y), where P (X = x i Y = y) is the conditional probability that X = x i given that Y = y. Let us also define the conditional entropy of X given Y as Definition 3. H(X Y ) = m P (Y = y i )H(X Y = y i ), where P (Y = y i ) is the probability that Y = y i. Here, we assume that Y is a discrete random variable. P will denote probabilities throughout the text. 5

6 2.3 A measure of information We can now define a measure of information. Assume that we have two random variables X and Y, the value of which are both unknown. If the value of Y is revealed to us, how much will that tell us about X? We define the amount of information conveyed by doing so to be the corresponding decrease in entropy. Definition 4. The amount of information revealed about X when given Y is I(X Y ) = H(X) H(X Y ). This value will always be non-negative, but we don t prove that here. 3 Noiseless communication 3.1 Introduction Given an information source, we now turn to the problem of encoding the messages from the information source so that they can be efficiently communicated over a noiseless channel. With a channel, we simply mean something that transmits binary data by accepting it at some point and reproducing it at some other point. A noiseless channel does this without introducing any errors. In reality we can think of sending data through a computer network (e.g. over the Internet) or storing it on a DVD 2 to later be read back again. It doesn t really matter what we do with the data, as long as we assume that it can be perfectly recalled later. The only requirement we have on the encoded message is that the original message should be possible to recreate. In particular, we don t care if decoding is expensive. Because we don t need to handle communication failures of any kind (we assume there will be no errors), it will be most efficient to encode the information as compactly as possible. At least, this is what we will strive for here. We will investigate the theoretical limits of how compactly data can be encoded on average. In real world applications, it may not be feasible to go that far, for various reasons. The theory can be made more general, but we restrict ourselves to binary codes here. Let us have a random variable X, like earlier. For each x i we will assign a codeword c i that consists of a sequence of one or more bits 3. Together, these code words make up a code. The code words can have different lengths. We observe the random variable one or more times, independently, to generate a sequence. Such a sequence will be called a message. 3.2 Instantaneous codes Not all codes are of interest, because they can t always be uniquely decoded. Lets define what we mean with that. Definition 5. A code is uniquely decodable if every finite sequence of bits corresponds to at most one message, according to that code. 2 DVD discs use error correction codes internally. Imagine here that we just store a file and assume that we don t need to handle read errors. 3 Bit means binary digit here, that is a 0 or a 1. 6

7 Here we assume that the message is encoded by concatenating the code words for each symbol in the message to create a sequence of bits that corresponds to that message. Decoding simply tries to do the reverse. We will take a closer look at one type of uniquely decodable codes. Definition 6. A code is called an instantaneous code, if no code word is a prefix 4 of any other code word. In particular, two code words can t be equal in an instantaneous code. We now show that codes of this type are always uniquely decodable. Theorem 1. Instantaneous codes are uniquely decodable. Proof. Lets start by concluding that there is a one-to-one correspondence between a sequence of code words and a message. Lets assume that we have a finite sequence of bits, an instantaneous code and two sequences of code words from that code, m 1 and m 2, which both agree with the sequence of bits. We need to show that m 1 = m 2 or, in other words, that there can t be two different such sequences. Consider the first code word in m 1 and m 2. What if they are not the same? That is not possible, because then one (the shorter, or both) would be a prefix of the other one and then the code is not instantaneous, which we assumed. We can now ignore the first code words (because they are identical) and look at the rest of the code words and the rest of the bits. The same argument can be applied again and by induction the whole sequences m 1 and m 2 must be the same. The following theorem tells us when it is possible to construct an instantaneous code. Theorem 2. There exists an instantaneous code with code words of the lengths n 1, n 2,..., n m if and only if m 2 ni 1. Proof. Without loss of generality, assume that n 1 n 2... n m = k. There are 2 k possible code words of length k. Lets call these base words. Any code word of length n k can be constructed by selecting one of these base words and taking a prefix of length n from it. Now, lets construct a code by selecting a code word of length n 1 and then one of length n 2 and so on. When we are selecting the first code word, any code word is possible (because there are no other code words that it can conflict with). Lets choose any code word of length n 1. This code word will be a prefix of 2 k n1 base words. No code word after this can be a prefix of these base words, so they are now excluded. Now, select a code word of length n 2. We can choose any non-excluded base word and take a prefix of length n 2 from that. This is possible, because we know that no other code word (so far) is a prefix of these base words and all chosen code words are at most the same length as this one, so this is enough. On the other hand, no other code words are possible, because then there exists already a code word that is a prefix of it. After choosing a code word of length n 2 (if possible), we need to exclude 2 k n2 new base words. We continue like this until we have chosen all the code words we need. At every 4 A code word of length n 1 is called a prefix of another code word of length n 2 if n 1 n 2 and the first n 1 bits of each code word form identical subsequences. 7

8 step we pick a prefix of any non-excluded base word, if there is any. No other choices are possible. It will be possible to select all m code words, so that they form an instantaneous code, if and only if at most 2 k base words are exluded after selecting them all. It doesn t really matter which code words we actually choose along the way. This is the same as requiring that m 2 k ni 2 k, which is equivalent to what we wanted to prove. We will only discuss instantaneous codes, but the following theory can be generalized to all uniquely decodable codes. The results will be similar. 3.3 The noiseless coding theorem The following theorem establishes a lower bound of how compactly information can be encoded. We prove later that there exist codes close to this bound. Definition 7. We call n = n p in i the average code word length of a code. Theorem 3 (Noiseless coding theorem). Given a random variable X and a corresponding instantaneous code, n is bounded below by H(X) n. Proof. The following facts will be needed (the first is from theorem 2): 2 ni 1 (1) ln(x) x 1 log 2 (x) log 2 (e)(x 1) (2) First rewrite the inequality from the theorem a bit. H(X) n p i log 2 (p i ) p i n i p i log 2 (p i ) p i log 2 (2 ni ) p i log 2 2 ni p i 0 Now we are almost done. It follows from (1) and (2) that 2 ( ni 2 n i ) p i log 2 p i log p 2 (e) 1 = i p i (( ( = log 2 (e) 2 n i ) n ) ) p i = log2 (e) 2 ni

9 We still don t know how close we can get to this bound, in general. It turns out that we can get very close. Theorem 4. Given a random variable X, there exists an instantaneous code such that H(X) n < H(X) + 1. Proof. For each i, choose n i as the integer that satisfies log 2 (p i ) n i < log 2 (p i ) + 1. It follows from theorem 2 that there exists an instantaneous code with these code word lengths, because 2 ni 2 log 2 (pi) = p i = 1. Furthermore, we note that by multiplying by p i above and summing over all code words, we get that p i log 2 (p i ) p i n i < p i log 2 (p i ) + which is equivalent to the inequality in the theorem. p i, It is now easy to see that it is possible, relatively speaking, to get arbitrarily close to the lower bound, by encoding several symbols together as a block and letting the length of this block tend towards infinity. If we encode a block of N symbols together, using an instantaneous code, then it is possible for each symbol to use less than 1/N bits above the lower bound on average. Optimal instantaneous codes can be efficiently constructed by using Huffman s algorithm. See almost any book about algorithms, e.g. [2] (also [1]). 4 Noisy communication 4.1 Noisy channels We will now discuss the problem of sending information over an unreliable communication channel. The discussion will be restricted to binary memoryless symmetric channels. The channel accepts a sequence m 1 of n bits at one point and delivers another sequence m 2 of n bits at another point. m 2 is constructed by adding noise to m 1. The noise is produced independently for each bit. With a probability of p the bit is unchanged and with a probability of q = 1 p the bit is flipped 5. A channel with p = 1 2 is completely useless (the received sequence is independent of the sequence that was sent). When p = 1, the channel is noiseless and the theory from the last section applies. If p < q, then we can just flip all bits in m 2 before delivering it, to get p > q instead. We will assume that 1 2 < p < 1. This means that 0 < q < 1 2. We call q the bitwise error probability. 5 Flipping a bit is the same as changing its value. There is only one way to do that. 9

10 Let B be the set of all possible bit sequences of length n. There are 2 n elements in B. We will always send a block of n bits over the channel at a time (but we may choose to vary n). Furthermore, assume that there is an information source which randomly selects a message from a set of 2 Rn possible messages, where R is a real number, 0 R 1 and x is the floor function of x. Let M denote the set of all possible messages. R is called the rate of the information source. Assign an element c(m) B to each m M. These are called code words. The mapping of messages to code words is called a code. Note that we don t require the code to assign unique code words to each message. Our goal is to send c(m) over the channel and recover m from the received sequence r B with high probability. We will later show when this is possible. 4.2 Channel capacity The following definition will be very important. Let X be a random variable that takes on one of the values 0 and 1 with the corresponding probabilities p 0 and p 1. We send the value of X over our noisy channel. Let Y be the received bit. Then I(X Y ) is a measure of the information conveyed by sending this bit over the channel. It depends on p 0 and p 1 (and the channel). Definition 8. The channel capacity (for this kind of channel) is C = max p 0,p 1 I(X Y ) = max p 0,p 1 H(X) H(X Y ). The channel capacity is defined similarly for more complicated channels. In our case, the maximum is always achieved when p 0 = p 1 = 1 2 and with those probabilities we get that C = H(X) H(X Y ) = 1 + p log 2 (p) + q log 2 (q). We will continue to assume that X and Y are as above (with p 0 = p 1 = 1 2 ), so that H(X Y ) = (p log 2 (p) + q log 2 (q)). (3) 4.3 Decoding We will present one way to decode the received bit sequence here. It will not necessarily be the best possible way, but it will be sufficient for our purposes. Without any noise, it would be fairly trivial to decode the received sequence. Let us therefore examine the noise that is introduced by the channel. We are mainly interested in the number of bits that are flipped. We will call this the number of errors. Definition 9 (Hamming distance). For x, y B, let d(x, y) be the number of bits that are different in x and y, when comparing bits at the same position in each sequence. When we send a message m M and receive a binary sequence r B, the number of errors can be written as e = d(c(m), r). Both e and r are random variables here (they are functions of the random noise and m). For each of the n bits in the code word, the probability that there will be a transmission error 10

11 for that bit is q (as discussed above). The weak law of large numbers directly gives us the following result. Theorem 5. Let e be the number of errors when sending an arbitrary code word of length n over the channel. Then, for any δ > 0, where P ( ) denotes the probability. lim P (n(q δ) < e < n(q + δ)) = 1, n This is very useful. Let us make a related definition and then see how this may help us decode received sequences. Definition 10. For any b B and δ > 0, define the set of potential code words to be S(b; δ) = {x B n(q δ) < d(b, x) < n(q + δ)}. It follows from theorem 5 that lim n P (c(m) S(r; δ)) = 1 (with m and r as above and δ > 0). We can thus assume that c(m) S(r; δ) and make the probability that we are wrong arbitrarily small by chosing a large enough n. We therefore look at all messages x, such that c(x) S(r; δ). If there is only one such message, we will assume that it is m and decode the received sequence as this message. If there are more than one such messages, then we simply choose to fail. We have made very few assumptions about the code so far, so we don t know how close together the code words lie. It is even possible that the decoding will always fail. This will be further discussed later. It will be useful to know the number of elements in S(m; δ). The following theorem gives an estimation of that. Theorem 6. Let S(m; δ) denote the number of elements in S(m; δ) and let H(X Y ) be as in equation (3) above. Then, for any ɛ > 0 and m B there is a δ > 0 such that S(m; δ) 2 n(h(x Y )+ɛ). Proof. Assume that we send m B over the noisy channel and receive a sequence r B (so that r is a random variable, because of the noise). Lets look at an element x S(m; δ). The probability that x is the received sequence depends on d(m, x) (this is the number of bit errors that is needed to turn m into x). A sequence with more errors is always less likely. For all x S(m; δ) we know that d(m, x) < n(q + δ), so that P (r = x) p n(p δ) q n(q+δ), x S(m; δ), where P (r = x) is the probability that the received sequence is x (when sending m). This, in turn, gives us a lower bound on the following average probability. S(m; δ) 1 P (r = x) p n(p δ) q n(q+δ) x S(m;δ) We also know that the probability that r S(m; δ) is at most 1, so that P (r = x) 1. x S(m;δ) 11

12 All the above shows that x S(m;δ) P (r = x) S(m; δ) = S(m; δ) 1 x S(m;δ) P (r = x) 1 p n(p δ) q = n(q+δ) 2n( p log 2 (p) q log 2 (q))+δ log 2 (p/q)). Finally, note that H(X Y ) = (p log 2 (p) + q log 2 (q)) (from (3) above) and choose δ > 0 such that ɛ = δ log 2 (p/q) (this is possible, because p > q). Then, we get that S(m; δ) 2 n( p log 2 (p) q log 2 (q))+δ log 2 (p/q)) = 2 n(h(x Y )+ɛ). 4.4 The noisy-channel coding theorem For this theorem, we will assume that the situation is like that described above. An information source randomly selects a message m out of 2 Rn possible messages (where x is the floor function of x). A code assigns a code word c(m) B to the message and the code word is sent over the channel. The received sequence r B is then decoded as described above. The channel capacity is denoted C as before. We say that the communication was successful if we manage to decode r and the decoded message is the sent message m. Otherwise, we say that there was a communication error. Each message m i will be selected with a probability of p i. For a certain code c, let P (error c) = i p ip (error m i, c) be the (average) probability of error, where P (error m i, c) is the probability of error when sending the message m i using the code c. Theorem 7 (Noisy-channel coding theorem). Assume that 0 < R < C, where R is the rate of the information source and C is the channel capacity. 6 Then, there exists a sequence of codes c 1, c 2,..., c n with code word lengths 1, 2,..., n, such that lim P (error c n) = 0. n Proof. The basic idea is to do as follows: 1. Select a random code, by independently assigning a random code word of length n to each message (with all potential code words being equally likely and duplicates being allowed). 2. Show that the expected value of the error probability (for a randomly chosen code) can be made arbitrarily small by selecting a large enough n. 3. Conclude that, for each n, there must be at least one code whose probability of error is less than or equal to the expected value. Create a sequence of such codes and we are done. 6 See section 4.1 for a description of R and definition 8 for a definition of C. 12

13 The rest of the proof will elaborate on step 2. Let us begin by taking a closer look at the expected value that is mentioned above. Note that there are N m = 2 Rn possible messages and N c = (2 n ) Nm possible codes. Let D be the set of all possible codes. For a random code c (that is, c is not a fixed code here), we are interested in E(P (error c)) = d D = d D = N m N 1 c p i d D N m N 1 c P (error d) = p i P (error m i, d) = Nc 1 P (error m i, d). Lets choose an arbitrary message m and see what d D N 1 c P (error m, d) will be. This is simply the probability of error when randomly chosing a code (as above) and then sending the message using that code. We decode the received sequence r as before. It will be more convenient to calculate the probability of success, so we will focus on that. For the communication to be successful, two things need to happen. First, we must have that c(m) S(r; δ). If that is true, then for all other messages x M\{m} we must have that c(x) / S(r; δ). If both these things are true, then the communication will be successful. The probability of success is thus the product of the probabilites of each of these things happening. We will let P ( ) denote probabilities. In other words, when using a random code c, we have that 1 d D Nc 1 P (error m, d) = P (c(m) S(r; δ))p ( x M\{m}, c(x) / S(r; δ)). We already know that lim n P (c(m) S(r; δ)) = 1 for any δ > 0. It remains to be shown that also lim n P ( x M\{m}, c(x) / S(r; δ)) = 1, at least for one δ > 0. In that case, it is clear that lim n E(P (error c)) = 0, as required above. What is the probability that no other message belongs to S(r; δ)? The code is selected by independently choosing a random code word for each message. It is, therefore, quite easy to calculate this. There are 2 Rn 1 other messages and 2 n possible code words. Note that S(r; δ) 2 n(h(x Y )+ɛ) and H(X Y ) = 1 C (see equation (3) above). The probability is P ( x M\{m}, c(x) / S(r; δ)) = (1 ) )+ɛ) 2 Rn 2n(H(X Y 2 n ( 1 ) 2 S(r; δ) Rn 1 2 n = (1 2 n(h(x Y ) 1+ɛ)) 2 Rn = = (1 2 n(c ɛ)) 2 Rn. Now, choose 0 < ɛ < C R and a corresponding δ > 0 (see theorem 6) and let 13

14 t = 2 Rn and k = (C ɛ)/r. Then k > 1 and t when n. We get that (1 2 n(c ɛ)) 2 Rn = ( 1 t k) t. It is easy to show that lim t ( 1 t k ) t = 1 (see the appendix). We can therefore make the probability of error arbitrarily small, by choosing a large enough n and suitable ɛ and δ. This completes our proof. 14

15 Appendix ( We will show here that lim ) t 1 t k t = 1, when k > 1. We will make the change of variables t = s 1 and in the third step we make use of l Hospital s rule. ( lim 1 t k ) ( t = exp lim t log ( ( ( 1 t k)) log ) ) 1 s k = exp lim = t t s 0 + s ( ks k 1 ) ( ks k 1 ) ( lims 0 + ks k 1 ) = exp lim s s k = exp lim s 0 + s k = exp 1 lim s 0 + s k = 1 ( ) 0 = exp = e 0 =

16 References [1] Robert B. Ash, Information Theory. Dover Publications, New York, [2] Cormen, Leiserson, Rivest and Stein, Introduction to Algorithms. MIT Press and McGraw-Hill, [3] Claude E. Shannon, A Mathematical Theory of Communication. Originally published in The Bell System Technical Journal,

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity 5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory 1 The intuitive meaning of entropy Modern information theory was born in Shannon s 1948 paper A Mathematical Theory of

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

1. Basics of Information

1. Basics of Information 1. Basics of Information 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L1: Basics of Information, Slide #1 What is Information? Information,

More information

to mere bit flips) may affect the transmission.

to mere bit flips) may affect the transmission. 5 VII. QUANTUM INFORMATION THEORY to mere bit flips) may affect the transmission. A. Introduction B. A few bits of classical information theory Information theory has developed over the past five or six

More information

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance. 9. Distance measures 9.1 Classical information measures How similar/close are two probability distributions? Trace distance Fidelity Example: Flipping two coins, one fair one biased Head Tail Trace distance

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel 0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Information and Entropy. Professor Kevin Gold

Information and Entropy. Professor Kevin Gold Information and Entropy Professor Kevin Gold What s Information? Informally, when I communicate a message to you, that s information. Your grade is 100/100 Information can be encoded as a signal. Words

More information

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Midterm, Tuesday February 10th Instructions: You have two hours, 7PM - 9PM The exam has 3 questions, totaling 100 points. Please start answering each question on a new page

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Uncertainity, Information, and Entropy

Uncertainity, Information, and Entropy Uncertainity, Information, and Entropy Probabilistic experiment involves the observation of the output emitted by a discrete source during every unit of time. The source output is modeled as a discrete

More information

LECTURE 13. Last time: Lecture outline

LECTURE 13. Last time: Lecture outline LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Lecture 11: Polar codes construction

Lecture 11: Polar codes construction 15-859: Information Theory and Applications in TCS CMU: Spring 2013 Lecturer: Venkatesan Guruswami Lecture 11: Polar codes construction February 26, 2013 Scribe: Dan Stahlke 1 Polar codes: recap of last

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Chapter 2. Error Correcting Codes. 2.1 Basic Notions

Chapter 2. Error Correcting Codes. 2.1 Basic Notions Chapter 2 Error Correcting Codes The identification number schemes we discussed in the previous chapter give us the ability to determine if an error has been made in recording or transmitting information.

More information

Quantum Error Correcting Codes and Quantum Cryptography. Peter Shor M.I.T. Cambridge, MA 02139

Quantum Error Correcting Codes and Quantum Cryptography. Peter Shor M.I.T. Cambridge, MA 02139 Quantum Error Correcting Codes and Quantum Cryptography Peter Shor M.I.T. Cambridge, MA 02139 1 We start out with two processes which are fundamentally quantum: superdense coding and teleportation. Superdense

More information

Exercise 1. = P(y a 1)P(a 1 )

Exercise 1. = P(y a 1)P(a 1 ) Chapter 7 Channel Capacity Exercise 1 A source produces independent, equally probable symbols from an alphabet {a 1, a 2 } at a rate of one symbol every 3 seconds. These symbols are transmitted over a

More information

EE5585 Data Compression May 2, Lecture 27

EE5585 Data Compression May 2, Lecture 27 EE5585 Data Compression May 2, 2013 Lecture 27 Instructor: Arya Mazumdar Scribe: Fangying Zhang Distributed Data Compression/Source Coding In the previous class we used a H-W table as a simple example,

More information

Information Theory - Entropy. Figure 3

Information Theory - Entropy. Figure 3 Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

X 1 : X Table 1: Y = X X 2

X 1 : X Table 1: Y = X X 2 ECE 534: Elements of Information Theory, Fall 200 Homework 3 Solutions (ALL DUE to Kenneth S. Palacio Baus) December, 200. Problem 5.20. Multiple access (a) Find the capacity region for the multiple-access

More information

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser

More information

Shannon s noisy-channel theorem

Shannon s noisy-channel theorem Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Lecture 7 September 24

Lecture 7 September 24 EECS 11: Coding for Digital Communication and Beyond Fall 013 Lecture 7 September 4 Lecturer: Anant Sahai Scribe: Ankush Gupta 7.1 Overview This lecture introduces affine and linear codes. Orthogonal signalling

More information

CIS 2033 Lecture 5, Fall

CIS 2033 Lecture 5, Fall CIS 2033 Lecture 5, Fall 2016 1 Instructor: David Dobor September 13, 2016 1 Supplemental reading from Dekking s textbook: Chapter2, 3. We mentioned at the beginning of this class that calculus was a prerequisite

More information

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x) Outline Recall: For integers Euclidean algorithm for finding gcd s Extended Euclid for finding multiplicative inverses Extended Euclid for computing Sun-Ze Test for primitive roots And for polynomials

More information

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019 Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

log 2 N I m m log 2 N + 1 m.

log 2 N I m m log 2 N + 1 m. SOPHOMORE COLLEGE MATHEMATICS OF THE INFORMATION AGE SHANNON S THEOREMS Let s recall the fundamental notions of information and entropy. To repeat, Shannon s emphasis is on selecting a given message from

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution 1. Polynomial intersections Find (and prove) an upper-bound on the number of times two distinct degree

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

More information

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur Lecture - 15 Analog to Digital Conversion Welcome to the

More information

Channel Coding: Zero-error case

Channel Coding: Zero-error case Channel Coding: Zero-error case Information & Communication Sander Bet & Ismani Nieuweboer February 05 Preface We would like to thank Christian Schaffner for guiding us in the right direction with our

More information

Ma/CS 6b Class 24: Error Correcting Codes

Ma/CS 6b Class 24: Error Correcting Codes Ma/CS 6b Class 24: Error Correcting Codes By Adam Sheffer Communicating Over a Noisy Channel Problem. We wish to transmit a message which is composed of 0 s and 1 s, but noise might accidentally flip some

More information

6.02 Fall 2011 Lecture #9

6.02 Fall 2011 Lecture #9 6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First

More information

Information Theory and Coding Techniques

Information Theory and Coding Techniques Information Theory and Coding Techniques Lecture 1.2: Introduction and Course Outlines Information Theory 1 Information Theory and Coding Techniques Prof. Ja-Ling Wu Department of Computer Science and

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3 Outline Computer Science 48 More on Perfect Secrecy, One-Time Pad, Mike Jacobson Department of Computer Science University of Calgary Week 3 2 3 Mike Jacobson (University of Calgary) Computer Science 48

More information

Asymptotic redundancy and prolixity

Asymptotic redundancy and prolixity Asymptotic redundancy and prolixity Yuval Dagan, Yuval Filmus, and Shay Moran April 6, 2017 Abstract Gallager (1978) considered the worst-case redundancy of Huffman codes as the maximum probability tends

More information

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing Lecture 7 Agenda for the lecture M-ary hypothesis testing and the MAP rule Union bound for reducing M-ary to binary hypothesis testing Introduction of the channel coding problem 7.1 M-ary hypothesis testing

More information

Lecture Notes for Communication Theory

Lecture Notes for Communication Theory Lecture Notes for Communication Theory February 15, 2011 Please let me know of any errors, typos, or poorly expressed arguments. And please do not reproduce or distribute this document outside Oxford University.

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect

More information

The Liar Game. Mark Wildon

The Liar Game. Mark Wildon The Liar Game Mark Wildon Guessing Games Ask a friend to thinks of a number between 0 and 15. How many NO/YES questions do you need to ask to find out the secret number? Guessing Games Ask a friend to

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Section 4.2: Mathematical Induction 1

Section 4.2: Mathematical Induction 1 Section 4.: Mathematical Induction 1 Over the next couple of sections, we shall consider a method of proof called mathematical induction. Induction is fairly complicated, but a very useful proof technique,

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

2. Duality and tensor products. In class, we saw how to define a natural map V1 V2 (V 1 V 2 ) satisfying

2. Duality and tensor products. In class, we saw how to define a natural map V1 V2 (V 1 V 2 ) satisfying Math 396. Isomorphisms and tensor products In this handout, we work out some examples of isomorphisms involving tensor products of vector spaces. The three basic principles are: (i) to construct maps involving

More information

Treatment of Error in Experimental Measurements

Treatment of Error in Experimental Measurements in Experimental Measurements All measurements contain error. An experiment is truly incomplete without an evaluation of the amount of error in the results. In this course, you will learn to use some common

More information

Noisy-Channel Coding

Noisy-Channel Coding Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.

More information

Notes 10: Public-key cryptography

Notes 10: Public-key cryptography MTH6115 Cryptography Notes 10: Public-key cryptography In this section we look at two other schemes that have been proposed for publickey ciphers. The first is interesting because it was the earliest such

More information

Shannon's Theory of Communication

Shannon's Theory of Communication Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam Fundamental

More information

Entanglement and information

Entanglement and information Ph95a lecture notes for 0/29/0 Entanglement and information Lately we ve spent a lot of time examining properties of entangled states such as ab è 2 0 a b è Ý a 0 b è. We have learned that they exhibit

More information

18.310A Final exam practice questions

18.310A Final exam practice questions 18.310A Final exam practice questions This is a collection of practice questions, gathered randomly from previous exams and quizzes. They may not be representative of what will be on the final. In particular,

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Guess & Check Codes for Deletions, Insertions, and Synchronization

Guess & Check Codes for Deletions, Insertions, and Synchronization Guess & Check Codes for Deletions, Insertions, and Synchronization Serge Kas Hanna, Salim El Rouayheb ECE Department, Rutgers University sergekhanna@rutgersedu, salimelrouayheb@rutgersedu arxiv:759569v3

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information