log 2 N I m m log 2 N + 1 m.

Size: px
Start display at page:

Download "log 2 N I m m log 2 N + 1 m."

Transcription

1 SOPHOMORE COLLEGE MATHEMATICS OF THE INFORMATION AGE SHANNON S THEOREMS Let s recall the fundamental notions of information and entropy. To repeat, Shannon s emphasis is on selecting a given message from a collection of possible messages. The set of all possible messages is called the source, and one might say that information (before transmission, at least) is less about what you do say than it is about what you can say. There are various legends about Shannon s choice of the term entropy to describe the average information of a source. The one that s repeated most often is John von Neumann s advice to Call it entropy. No one knows what entropy is, so if you call it that you will win any argument. This little pointer on linguistic wrangling notwithstanding, there are many similarities in form and substance between Shannon s definition of entropy and the term as it is used in physics, especially in thermodynamics and statistical mechanics. For the definitions: Consider a source consisting of N symbols s 1,..., s N. In terms of asking yes-no questions, if it takes I questions to select one particular symbol then a binary search is the best strategy, e.g., Is the particular symbol in the first half of the list of symbols? Is it in the first half of the first half? Etc.This leads to the relation 2 I = N or I = log 2 N. This is assuming that the symbols are equally probable. We talked a little about what this formula means if N is not a power of 2, and so log 2 N is not a whole number. Take, say N = 9, which has log 2 9 = Playing the game, optimally, the number of questions we need to ask for a particular round clearly satisfies log 2 9 # questions log , so at least three and at most four questions. Likewise if we re trying to determine one of N possible symbols then log 2 N # questions log 2 N + 1, In general, the idea is that the number of questions one needs to ask on average, if the game is played many times, should tend to log 2 N. Let s be a little more precise about this. If we play the game m times it s as if we have m players each playing the game once and picking out an unknown m-tuple (s 1, s 2,... s m ). There are N m possible unknown m-tuples, all equally probable. If it takes I m questions to pick one out then, as above, log 2 N m I m log 2 N m + 1. Write this as or m log 2 N I m m log 2 N + 1 log 2 N I m m log 2 N + 1 m. 1

2 Now, I s the number of questions it takes to guess m-symbols so I m /s the number of questions it take to pick out one symbol on average. We can make I m /m arbitrarily close to log 2 N by taking m large. Back to the general definition of information content. If, more realistically, the symbol s n has an associated probability p n then the information of the message s n is I(s n ) = log 1 p n in bits. 1 Note that if the symbols are equally probable then they each have probability 1/N and the formula gives I(s n ) = log N as before. The entropy is the average information of the source, that is, it is the weighted average of the information of each of the symbols according to their respective probabilities: N N H = p n I(s n ) = p n log 1. p n n=1 Logs add... and so does information. Here s a nice bit of intuition bundling up information, logs, and old, familiar identities. Suppose that a message A occurs with probability p 1 and another message B occurs with probability p 2. What is the information associated with receiving both messages? Intuitively, we feel that the informations should add, that is, the information associated with receiving both messages (our surprise at receiving both) should be the sum of the information associated with each provided they are independent. You wouldn t say It s raining outside and The sidewalk is wet are independent messages, but you would say that It s raining outside and I play the trombone are independent. One fundamental idea in probability is that the probability of two independent events occurring is the product of the probabilities of each. (Think of coin tossing, or rolling a die, for common examples.) Thus, as we ve written it, the probabilities of both messages occuring is p 1 p 2, and the information associated with that is I(A and B) = log 1 p 1 p 2. But, we all know, the log of a product is the sum of the logs, and hence I(A and B) = log 1 p 1 p 2 = log 1 p 1 + log 1 p 2 = I(A) + I(B). Turning this around, if you ever wanted an intuitive reason why the log of a product should be the sum of the logs, think of this that the information of the sus the sum of the informations. 1 We ll always use logs to base 2 and we ll drop the subscript. To compute, you can relate the log base 2 to the log base 10 (for example) by the formula n=1 log 2 x = log 10 x log 10 2 = log 10 x

3 An important special case of entropy. Let s take a special case of an entropy calculation. Suppose we have a source consisting of just two messages A and B. Suppose A occurs with probability p. Since only A or B can occur, and one of them must occur, the probability that B occurs is then 1 p. Thus the entropy of the source is H({A, B}) = p log 1 p + (1 p) log 1 1 p How does this look as a function of p, for 0 < p < 1, and for the limiting cases p = 0 and p = 1? Here s a graph: At p = 0 or p = 1 the entropy is zero, as it should be (there is no uncertainty) and the entropy is a maximum at p = 1/2, also as it should be (each message is equally likely). The maximum value is 1. These facts can be verified with some calculus I won t do that. Unequal probability forces entropy down. Let s derive one general property of entropy, a property that can sometimes help your intuition. Again suppose we have a source of N messages S = {s 1, s 2,..., s N }. If the probabilities are equal then they are all equal to 1/N, the information content of each message is log N and the entropy of the source is H(S) = N 1 N log N = 1 N N log N 1 = log N. n=1 Makes sense. Let s show that if the messages are not equiprobable then the entropy goes down, i.e., let s show that H(S) log N and that equality holds only when the probabilities are equal. There are several ways one can show this (at least two ways to my knowledge). We ll use an approach based on optimization, just for practice in some ideas you may not have seen 3 n=1

4 for awhile (and certainly not in this context). 2 Suppose the probabilities are p 1, p 2,... p N. We want to find the extremum of N H(p 1, p 2,..., p N ) = p n log 1 N subject to the constraint p k > 0 and p n = 1. p n n=1 The constraint is the fact that the probabilities add to 1. A job for Lagrange multipliers! We might as well use log to base e here, since that only amounts to scaling the entropy, and we know how to differentiate that. Calling the constraint function N g(p 1, p 2,... p N ) = p n, at an extremum we have H = λ g. Now, p k log 1 = log 1 + p k ( 1 ) = log 1 1, p k p k p k p k p k so the condition on the gradients is, componentwise, n=1 log 1 p k 1 = λ 1. Thus all the p k s are equal, and since they add to 1 they must all equal 1/N. The extremal is a maximum because 2 p k p l H(p 1, p 2,..., p N ) = 1 p k δ kl < 0. Done. More on the meaning of entropy after we look briefly at a few sample problems that put entropy to work. Fool s gold. Consider the following weighing problem. 3 You have 27 apparently identical gold coins. One of thes false and lighter but is otherwise indistinguishable form the others. You also have a balance with two pans, but without standard, known weights for comparison weighings. Thus you have to compare the gold coins to each other and any measurement will tell you if the loaded pans weigh the same or, if not, which weighs more. How many measurements are needed to find the false coin? - Before solving the problem, find a lower bound for the number of weighings. Here s an information theoretic way of analyzing the problem. First, the amount of information we need to select one of 27 items is log 27 bits, for we want to select one coin from among 27 and indistinguishable here translates to equiprobable. A weighing putting some coins on one pan of the balance and other coins on the other pan produces one of three outcomes: (1) The pans balance 2 The other approach is vis Jensen s inequality, a general result on convex functions. That s the better approach, really, but we d have to develop Jensen s inequality. Worthwhile, but maybe another time. 3 This problem comes from A Diary on Information Theory, by A. Rényi. 4 n=1

5 (2) The left pan is heavier (3) The right pan is heavier If there is a scheme so that these outcomes are also equiprobable then a single weighing is worth log 3 bits (each outcome has probability 1/3). In such a scheme, if there are n weighings then we get n log 3 bits of information. Thus we want to choose n so that n log 3 log 27 = 3 log 3, so n 3. Any weighing scheme that does not have equally probable outcomes will have a lower entropy, as we showed above. Thus we get less information from such a scheme and so would require more weighings to find the false coin. This analysis suggests we should aim to find the fake coin with three weighings. Here s a way to weigh. Divide the coins into three piles of 9 coins each and compare two piles. If the scale balances then the fake coin is in the remaining pile of 9 coins. If one pan is heavier, then the other pan has the fake coin. In any event, with one weighing we now can locate the fake coin among a group of nine coins. Within that group, weigh three of those coins against another three. The result of this weighing isolates the fake coin in a group of three. In this last group, weigh two of the coins again each other. That weighing will determine which coin is fake. 4 Problem Suppose instead that you have 12 coins, indistinguishable in appearance but one is either heavier or lighter than the others. What weighing scheme will allow you to find the false coin? Place your bets. Here s an example of applying Shannon s entropy to a two message source where the probabilities are not equal. 5 Suppose we have a roulette wheel with 32 numbers. (Real roulette wheels have more, but 32 = 2 5 so our calculations will be easier.) You play the game by betting on one of the 32 numbers. You win if your number comes up and you lose if it doesn t. In a fair game all numbers are equally likely to occur. If we consider the source consisting of the 32 statements The ball landed in number n, for n going from 1 to 32, then all of those statements are equally likely and the information associated with each is I = log = log 32 = 5. (It takes 5 bits to code the number.) But you don t care about the particular number, really. You want to know if you won or lost. You are interested in the two message source W = You have won L = You have lost 4 Note that another approach in the second step might be lucky. Namely, in a group of nine coins, one of which is known to be bad, weigh four against four. If the scale balances then the left over coin is the fake, and we ve found it with two weighings. But if the scale doesn t balance then we re left with a group of four coins, and it will take two more weighings to find fake coin. 5 This example is from Silicon Dreams by Robert W, Lucky. 5

6 What is the entropy of this source? The probability of A occuring is 1/32 and the probability of B occuring is 31/32 = 1 1/32. So the entropy of the source is H = log 32 + log = = That s a considerably smaller number than 5; L is a considerably more likely message. This tells us that, on average, you should be able to get the news that you won or lost in about 0.2 bits. What does that mean, 0.2 bits? Doesn t it take at least one bit for each, say 1 for You have won and 0 for You have lost? Remember, the notions of 1/32 probability of winning and 31/32 probability of losing only make sense with the backdrop of playing the game many times. Probabilities are statements on percentages of occurences in the long run. Thus the question, really, is: If you have a series of wins and losses, e.g. LLLLLW LLLLLLW W LLLL..., how could you code your record so that, on average, it takes something around 0.2 bits per symbol? Here s one method. We track the won/loss record in 8 event blocks. It s reasonable to expect that 8 events in a row will be losses. As we look at the record we ask a yes/no question: Are the next 8 events all losses?. If the answer is yes we let 0 represent that block of 8 losses in a row. So every time that this actually happens in the total record we will have resolved 8 outcomes with a single bit, or a single outcome (in that block) with 1/8 of a bit. If the answer to the yes/no question is no, that is, if the next 8 events were not all losses then we have to record what they were. Here s the scheme: 0 means the next 8 results are LLLLLLL; 1XXXXXXXX mean the 8 X bits specify the actual record. So, for example, means the next 8 results are LLLW LLLL So, starting the record at the beginning, we might have, say, , meaning that the first 16 events are losses, the next 8 are LLLWLLLL, the next 8 are losses, and so on. How well does this do? Look at a block of eight win-loss outcomes. The probability of a single loss is 31/32 and the probability of 8 straight losses is (31/32) 8 =.776 (the events are independent, so the probabilities multiply), and it takes 1 bit to specify this. The probability that the 8 results are not losses is =.224, and, according to our scheme it takes 9 bits to give these results. Thus, on average, the number of bits needed to specify 8 results are = Finally, the average number of bits per result is therefore 2.792/8 =.349. That s much closer to the.2 bits given by calculating the entropy. Let me raise two questions: Can we be more clever and get the average number of bits down further and can we even be so sneaky as to break the entropy of 0.2 bits per symbol? Is this just a game, or what? 6

7 By way of example, let me address the second question first. An example: Run length coding. Don t think that the strategy of coding blocks of repeated symbols in this way is an idle exercise. There are many examples of messages when there are few symbols, and probabilities strongly favor one over the other thus indicating that substantial blocks of a particular symbol will occur often. The sort of coding scheme that makes use of this. much as we did above, is called run length coding. Here s an example of how run length coding can be used to code a simple image efficiently. I say simple because we ll look at a 1-bit image just black or white, no shades of gray. The idea behind run length coding, and using it to compress the digital representation of this image, is to code by blocks, rather than by individual symbols. For example 0X, where X is a binary number, means that the next X pixels are black. 1Y means that the next Y pixels are white. There are long blocks of a single pixel brightness, black or white, and coding blocks can result in a considerable savings of bits over coding each pixel with its own value. You shouldn t think that 1-bit, black-and-white images are uncommon or not useful to consider. You wouldn t want then for photos we certainly didn t do the palm tree justice but text files are precisely 1-bit images. A page of text has letters (made up of black pixels) and white space in between and around the letters (white pixels) Fax machines, for example, treat test documents as images they don t recognize and record the symbols on the page as characters, only as black or white pixels. Run length coding is a standard way of compressing black-and-white images for fax transmission. 6 Run length coding can also be applied to grayscale images. In that case one codes for a run of pixels all of the same brightness. Typical grayscale images have enough repetition that the compression ratio is about 1.5 to 1, whereas for black-and-white images it may be as high as 10 to 1. 6 Actually, fax machines combine run length coding with Huffman coding, a topic we ll discuss later. 7

8 How low can you go: The Noiseless Source Coding Theorem Now let s consider the first question I raised in connection with the roulette problem. I ll phrase it like this: Is there a limit to how clever we can be in the coding? The answer is yes the entropy of the source sets a lower bound for how efficient the coding of a message can be. This is a really striking demonstration of how useful the notion of entropy is. I d say it s a demonstration of how natural it is, but it only seems natural after it has been seen to be useful much of mathematics seems to go this way. The precise result is called Shannon s Noiseless Source Coding Theorem. Before stating and proving the result, we need a few clarifications on coding. The source symbols are the things to be selected (in other words, the things to be sent, if we re thinking in terms of communication). These can have any form: letters, pictures, sounds they are uncoded. As before, call the source symbols s 1,..., s N. The symbols have associated probabilities p i and the entropy is defined in terms of these probabilities. Now we code each source symbol, and for simplicity we work exclusively with binary codes. A coded symbol is then represented by binary numbers, i.e. by strings of 0 s and 1 s. Each source symbol is encoded as a specified strings of 0 s and 1 s call these the coded source symbols. The 0 and 1 are the code alphabet. 7 A codeword is a string of coded source symbols, and a message is a sequence of codewords. The main property we want of any code is that it be uniquely decodable. It should be possible to parse any codeword unambiguously into source symbols. For example coding symbols s 1, s 2, s 3 and s 4 as s 1 = 0 s 2 = 01 s 3 = 11 s 4 = 00 is not a uniquely decodable code. The codeword 0011 could be either s 3 s 4 or s 1 s 1 s 3. We ll look at the finer points of this later. Suppose we have a message using a total of M source symbols. The probabilities of the source symbols are determined by their frequency counts in the message. Thus if the i th source symbol occurs times in the message (its frequency count) then the probability associated with the i th coded source symbol is the fraction of times it does occur, i.e., and the entropy of the source is H(S) = i p i = M. p i log 1 p i = i M log M. Changing M changes the entropy for it changes the probabilities of the source symbols that occur. That will in turn affect how we might code the source symbols. 7 Thus we consider radix 2 codes. If the symbols in the code are drawn from an alphabet of r elements then he code is referred to as radix r. 8

9 So, with M fixed, code the source symbols (with a uniquely decodable code), and let A M be the average of the lengths of all the coded source symbols. Thus A M is a weighted average of the lengths of the individual coded source symbols, weighted according to their respective probabilities: A M = M length i. i There are different ways of coding the source, and efficient coding of a message means making A M small. A code is optimal if A M is as small as possible. How small can it be? One version of Shannon s answer is: Noiseless Source Coding Theorem In the notation above, A M H(S). This is called the noiseless source coding theorem because it deals with codings of a source before any transmission; no noise has yet been introduced that might corrupt the message. We re going to prove this by mathematical induction on M. To recall the principle of induction, we: (1) Establish that the statement is true for M = 1, and (2) Assume that the statement is true for all numbers M (the induction hypothesis) and deduce that it is also true for M + 1. The principle of mathematical induction says that if we can do this then the statement is true for all natural numbers. The first step is to verify the statement of the theorem for M = 1. When M = 1 there is only one source symbol and the entropy H = 0. Any coding of the single message is at least one bit long (and an optimal coding would be exactly one bit, of course), so A 1 1 > 0 = H. We re on our way. The induction hypothesis is that A M H for a message with M source symbols. Suppose we have a message with M + 1 source symbols. The entropy changes and we want to show that A M+1 H(S). Here H(S) = M + 1 log M + 1, m i i where is the frequency count of the i th coded source symbol in S in the message with M + 1 source symbols. Code S and split the coded source symbols into two classes: S 0 = all source symbols in S whose coded source symbol starts with a 0 S 1 = all source symbols in S whose coded source symbol starts with a 1 The number of source symbols in drawing from S 0 and S 1, respectively, are computed by M 0 = S 0, and M 1 = S 1, where the sums go over the coded source symbols in S which are in S 0 and S 1. Now M 0 + M 1 = M + 1, 9

10 and we can assume that neither M 0 nor M 1 is zero. (If, say, M 0 = 0 then no coded source symbols of S start with a 0, i.e. all coded source symbols start with a 1. But then this leading 1 is a wasted bit in coding the source symbols and this is clearly not optimal.) Thus we can say that 1 M 0, M 1 M, and we are set up to apply the induction hypothesis. Now drop the first bit from the coded source symbols coming from S 0 and S 1.Then, first of all, the average of the lengths of the coded source symbols in S is given by the weighted average of the lengths of the coded source symbols in S 0 and S 1 plus 1 adding the one comes from the extra bit that s been dropped, namely A M+1 = 1 + S 0 M + 1 length i + S 1 M + 1 length i = 1 + M 0 length M + 1 M i + M 1 length 0 M + 1 M i. 1 S 0 S 1 The sums on the right hand side are larger than if we use an optimal code for messages of size M 0 and M 1, so A M M 0 M + 1 A M 0 + M 1 M + 1 A M 1 and we can then apply the induction hypothesis to bound the average length by the entropy for messages of sizes M 0, M 1. That is, A M M 0 M + 1 A M 0 + M 1 M + 1 A M M 0 M + 1 S 0 M 0 log M 0 + M 1 M + 1 This is the key observation, for now several algebraic miracles occur: 1 + M 0 log M 0 + M 1 log M 1 = M + 1 M 0 M + 1 M 1 S S 0 M + 1 log M 0 + S 1 = 1 + S 0 M + 1 log 1 + S 1 = S = H(S) log M + 1 log M log S 1 M + 1 log M 1 S 1 M 1 log M 1. M + 1 log 1 + M 0 M + 1 log M 0 + M 1 M + 1 log M 1 1 M M 0 M + 1 log M 0 + M 1 M + 1 log M 1 1 M M 0 M + 1 log M 0 + M 1 M + 1 log M 1 To complete the proof of the theorem, that A M+1 H(S), we have to show that 1 + log 1 M M 0 M + 1 log M 0 + M 1 M + 1 log M

11 Write x = M 0 M + 1, and hence 1 x = M 1 M + 1, since M 0 + M 1 = M + 1. Then what we need is the inequality 1 + x log x + (1 x) log(1 x) 0, for 0 < x < 1. This is a calculus problem the same one as finding the maximum of the entropy of a twosymbol source. The function has a unique minimum when x = 1/2 and it s value there is zero. The induction is complete and so is the proof of Shannon s source coding theorem. Crossing the Channel Shannon s greatest contribution was to apply his measure of information to the question of communication in the presence of noise. In fact, in Shannon s theory a communication channel can be abstracted to a system of inputs and outputs, where the outputs depend probabilistically on the inputs. Noise is present in any real system and its effects are felt in what happens, probabilistically, to the inputs when they become outputs. In the early days (the early 1940 s) it was naturally assumed that increasing the transmission rate (now thought of in terms of bits per second, though they didn t use bits then) across a given communications channel increased the probability of errors in transmission. Shannon proved that this was not true provided that the transmission rate was below what he termed the channel capacity. One way of stating Shannon s theores: Shannon s Channel Coding Theorem A channel with capacity C is capable, with suitable coding, of transmitting at any rate less than C bits per symbol with vanishingly small probability of error. For rates greater than C the probability of error cannot be made arbitrarily small. There s a lot to define here and we have to limit ourselves to just that, the definitions. This is not an easy theorem and the proof is not constructive, i.e., Shannon doesn t tell us how to find the coding scheme, only that a coding scheme exists. The Channel: Mutual Information Here s the set-up. Suppose we have messages s 1, s 2,..., s N to select from, and send, at one end of a communication channel, and messages r 1, r 2,..., r M that are received at the other end. We need not have N = M. At the one end, the s j s are selected accoording to some probability distribution, say the probability that s j is selected is P (s j ). Shannon s idea was to model the abstract relationship between the two ends of a channel as a set of conditional probabilities. So let s say a few words about that. Conditional Probabilities and Bayes Theorem. Conditional probabilities are introduced to deal with the fact that the occurrence of one event may influence the occurrence of another. We use the notation P (A B) = The probability that A occurs given that B has occurred. 11

12 This is often read simply as The probability of A given B. This not the same as the probability of A and B occurring. For conditional probability we watch for B and then look for A. For example A = The sidewalk is wet. B = It is raining. Evidently P (A B) is not the same as P (A, B) (meaning the probability of A and B). It s also evident that P (A B) P (B A). A formula for the conditional probability P (A B) is P (A B) = P (A, B) P (B) One way to think of this is as follows. Imagine doing an experiment N times (a large number) where both A and B can be observed outcomes. Then the probability that A occurs given that B has occured is approximately the number of times that A occurs in the runs of the experiment when B has occurred, i.e., Number of occurrences of A and B P (A B). Number of occurrences of B Now write this as P (A B) = Number of occurrences of A and B Number of occurences of B Number of occurrences of A and B N Number of occurences of B N. P (A and B). P (B) If A and B are independent events then P (A and B) = P (A)P (B) and so P (A B) = P (A and B) P (B) = P (A), which makes sense if A and B are independent, and you re A, who cares whether B happened. Interchanging A and B we also then have We can also write these formulas as But we do have the symmetry and so or P (B A) = P (B, A) P (A) P (A, B) = P (A B)P (B), P (B, A) = P (B A)P (A). P (A, B) = P (B, A) P (A B)P (B) = P (B A)P (A), P (A B) P (B A) = P (B) P (A). 12.

13 This is known as Bayes Theorem. It s a workhorse in probability problems. The Channel as Conditional Probabilities. Once again, we have a set of messages to send, s 1, s 2,..., s N, and a set of messages that are received r 1, r 2,..., r M. To specify the channel is to specify the conditional probabilities P (r k s j ) = Probability that r k was received given that s j was sent. According to the formula for conditional probabilities P (r k s j ) = P (r k, s j ) P (s j ) Think of this as answering the question: Given the input s j, how probable is it that the output r k results? You might find it helpful to think of this characterization of the channel as an M N matrix. That is, write P (r 1 s 1 ) P (r 1 s 2 )... P (r 1 s N ) P = P (r 2 s 1 ) P (r 2 s 2 )... P (r 2 s N ) P (r M s 1 ) P (r M s 2 )... P (r M s N ) The k th row, ( P (rk s 1 ) P (r k s 2 ) P (r k s 3 ) P (r k s N ) ). is all about what might have led to the received message r k : It might have come from s 1 with probability P (r k s 1 ); it might have come from s 2 with probability P (r k s 2 ); it might have come from s 3 with probability P (r k s 3 ), and so on. The message r k had to have come from somewhere, so observe that If s j and r k are independent, then N P (r k s j ) = 1. j=1 P (r k s j ) = P (r k, s j ) P (s j ) = P (r k)p (s j ) P (s j ) = P (r k ) and the channel matrix looks like P (r 1 ) P (r 1 )... P (r 1 ) P = P (r 2 ) P (r 2 )... P (r 2 ) P (r M ) P (r M )... P (r M ) What s the channel matrix for the binary symmetric channel? Here {s 1, s 2 } = {0, 1} and {r 1, r 2 } likewise is {0, 1}, and ( ) p 1 p P = 1 p p 13

14 Probability in, probability out. The s j s are selected with some probability P (s j ). Consequently, the received messages also occur with some probability, say P (r k ). One can show (via Bayes theorem) that this is given by P (r 1 ) P (r 1 s 1 ) P (r 1 s 2 )... P (r 1 s N ) P (s 1 ) P (r 2 ). = P (r 2 s 1 ) P (r 2 s 2 )... P (r 2 s N ) P (s 2 ) P (r M ) P (r M s 1 ) P (r M s 2 )... P (r M s N ) P (s N ) Mutual Information and Entropy. Next let s look at things from the receiving end. Here the relevant quantity is P (s j r k ) = The probability that a message s j was sent given that a message r k was received. Now, the message s j had an a priori probability P (s j ) of being selected. On receiving r k there is thus a change in status of the probability associated with s j, from an a priori P (s j ) to an a posteriori P (s j r k ) (after reception of r k ). By the same token, according to how we think of information (quantitatively) we can say that there has been a change of status of the information associated with s j on receiving r k, from an a priori log(1/p (s j )) to an a posteriori log(1/p (s j r k )). The former, log(1/p (s j ), is what we ve always called the information associated with s j, the latter, log(1/p (s j r k ), is something we haven t used or singled out, but it s along the same lines. What s really important here is the change in the information associated with s j, and so we set I(s j, r k ) = log 1 P (s j ) log 1 P (s j r k ) = log P (s j r k ) P (s j ). and call this the mutual information of s j given r k. It is a measure of the (amount of) information the channel transmits about s j if r k is received. If s j and r k are independent then P (s j r k ) = P (s j ). Intuitively, there has been no change of status in the probability associated with s j. To say it differently there has been nothing more learned about s j as a result of receiving r k, or there has been no change in the information associated with s j as a result of receiving r k. In terms of mutual information, this is exactly the statement that I(s j, r k ) = 0, the mutual information is zero. The mutual information is only positive when P (s j r k ) > P (s j ), i.e., when we are more certain of s j by virtue of having received r k. Just as entropy of a source proved to be a more widely useful concept than the information of a particular message, here too it is useful to define a system mutual information by finding an average of the mutual informations of the individual messages. This depends on all of the source messages, S, and all of the received messages R. It can be expressed in terms of the entropy H(S) of the source S, the entropy, H(R) of the received messages R, both of which use our usual definition of entropy, and a conditional entropy H(S R) which is something new. For completeness, the joint entropy is defined by H(S R) = j,k P (s j and r k ) log 14 1 P (s j r k )

15 This isn t so hard to motivate, but enough is enough it shouldn t be hard to believe there s something like a conditional entropy and let s just use it to define the system mutual information, which is what we really want. This is I(S, R) = H(S) H(S R). This does look very much like an averaged form of the mutual information, above. In fact, one can show I(S, R) = P (r k, s j ) log P (r k, s j ) P (r k )P (s j ). j,k And if the mutual information I(s j, r k ) is supposed to measure the amount of information the channel transmits about s j given that r k was received, then I(S, R) should be a measure of the amount of information (on average) of all the source messages S that the channel transmits given all the received messages R. It s a measure of the change of status of the entropy of the source due to the conditional probabilities between the source messages and received messages the difference between the uncertainty in the source before transmission and the uncertainty in the source after the messages have been received.. Channel Capacity. So now, finally, what is the channel capacity? It s the most information a channel can ever transmit. This Shannon defines to be C = max I(S, R) where the maximus taken over the possible probabilities for the source messages S. (That is, the maximus taken over all possible ways (in terms of probabilities) of selecting the source messages.) It is in terms of this that Shannon proved his famous channel coding theorem. We state it again: Shannon s Channel Coding Theorem A channel with capacity C is capable, with suitable coding, of transmitting at any rate less than C bits per symbol with vanishingly small probability of error. For rates greater than C the probability of error cannot be made arbitrarily small. Whew! I must report that we do not have the tools available for a proof. An example: The symmetric binary channel We send 0 s and 1 s, and only 0 s and 1 s. In a perfect world there is never an error in transmission. In an imperfect world, errors might occur in the process of sending our message; a 1 becomes a zero or vice versa. Let s say that, with probability p a transmitted 1 is received as a 1. Then, of course, with probability 1 p a transmitted 1 is received as a 0. Symmetrically, let s say that the same thing applies to transmitting a 0; with probability p a transmitted 0 is received as a 0, and with probability 1 p a transmitted 0 is received as a 1. This describes the channel completely. It s called a binary symmetric channel, and it s represented schematically by the following drawing. 15

16 This is a case where the channel capacity can be computed exactly, and the result is C = 1 H. The maximum of the mutual information between the sent and received messages (as in the definition of capacity) is attained when the two symbols are equally probable. That s probability 1/2 and 0 and 1 each have information content 1. It s as if, in a perfect world, the 0 and 1 were each worth a full bit before we sent them, but the noise in the channel, measured by the probability of error in transmission, takes away H bits, leaving us with only 1 H bits worth of information in transmitting each 0 or 1. That s a limit on the reliability of the channel to transmit information. Even this is not a trivial calculation, however. 16

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019 Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black

More information

1 Normal Distribution.

1 Normal Distribution. Normal Distribution.. Introduction A Bernoulli trial is simple random experiment that ends in success or failure. A Bernoulli trial can be used to make a new random experiment by repeating the Bernoulli

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

Nondeterministic finite automata

Nondeterministic finite automata Lecture 3 Nondeterministic finite automata This lecture is focused on the nondeterministic finite automata (NFA) model and its relationship to the DFA model. Nondeterminism is an important concept in the

More information

Information and Entropy. Professor Kevin Gold

Information and Entropy. Professor Kevin Gold Information and Entropy Professor Kevin Gold What s Information? Informally, when I communicate a message to you, that s information. Your grade is 100/100 Information can be encoded as a signal. Words

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

Please bring the task to your first physics lesson and hand it to the teacher.

Please bring the task to your first physics lesson and hand it to the teacher. Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

CS1800: Strong Induction. Professor Kevin Gold

CS1800: Strong Induction. Professor Kevin Gold CS1800: Strong Induction Professor Kevin Gold Mini-Primer/Refresher on Unrelated Topic: Limits This is meant to be a problem about reasoning about quantifiers, with a little practice of other skills, too

More information

CIS 2033 Lecture 5, Fall

CIS 2033 Lecture 5, Fall CIS 2033 Lecture 5, Fall 2016 1 Instructor: David Dobor September 13, 2016 1 Supplemental reading from Dekking s textbook: Chapter2, 3. We mentioned at the beginning of this class that calculus was a prerequisite

More information

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

CS 124 Math Review Section January 29, 2018

CS 124 Math Review Section January 29, 2018 CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 203 Vazirani Note 2 Random Variables: Distribution and Expectation We will now return once again to the question of how many heads in a typical sequence

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

You separate binary numbers into columns in a similar fashion. 2 5 = 32

You separate binary numbers into columns in a similar fashion. 2 5 = 32 RSA Encryption 2 At the end of Part I of this article, we stated that RSA encryption works because it s impractical to factor n, which determines P 1 and P 2, which determines our private key, d, which

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Spring 206 Rao and Walrand Note 6 Random Variables: Distribution and Expectation Example: Coin Flips Recall our setup of a probabilistic experiment as

More information

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2 1 Basic Probabilities The probabilities that we ll be learning about build from the set theory that we learned last class, only this time, the sets are specifically sets of events. What are events? Roughly,

More information

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory 1 The intuitive meaning of entropy Modern information theory was born in Shannon s 1948 paper A Mathematical Theory of

More information

Probability (Devore Chapter Two)

Probability (Devore Chapter Two) Probability (Devore Chapter Two) 1016-345-01: Probability and Statistics for Engineers Fall 2012 Contents 0 Administrata 2 0.1 Outline....................................... 3 1 Axiomatic Probability 3

More information

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning STATISTICS 100 EXAM 3 Spring 2016 PRINT NAME (Last name) (First name) *NETID CIRCLE SECTION: Laska MWF L1 Laska Tues/Thurs L2 Robin Tu Write answers in appropriate blanks. When no blanks are provided CIRCLE

More information

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is

More information

Chapter 8: An Introduction to Probability and Statistics

Chapter 8: An Introduction to Probability and Statistics Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including

More information

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 202 Vazirani Note 4 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected in, randomly

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: 03 17 08 3 All about lines 3.1 The Rectangular Coordinate System Know how to plot points in the rectangular coordinate system. Know the

More information

Shannon's Theory of Communication

Shannon's Theory of Communication Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam Fundamental

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore

Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore (Refer Slide Time: 00:15) Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore Lecture No. # 03 Mathematical Preliminaries:

More information

P (E) = P (A 1 )P (A 2 )... P (A n ).

P (E) = P (A 1 )P (A 2 )... P (A n ). Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer

More information

STA Module 4 Probability Concepts. Rev.F08 1

STA Module 4 Probability Concepts. Rev.F08 1 STA 2023 Module 4 Probability Concepts Rev.F08 1 Learning Objectives Upon completing this module, you should be able to: 1. Compute probabilities for experiments having equally likely outcomes. 2. Interpret

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x) Outline Recall: For integers Euclidean algorithm for finding gcd s Extended Euclid for finding multiplicative inverses Extended Euclid for computing Sun-Ze Test for primitive roots And for polynomials

More information

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

6.080 / Great Ideas in Theoretical Computer Science Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Noisy-Channel Coding

Noisy-Channel Coding Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

Conditional Probability

Conditional Probability Conditional Probability When we obtain additional information about a probability experiment, we want to use the additional information to reassess the probabilities of events given the new information.

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Solving Equations by Adding and Subtracting

Solving Equations by Adding and Subtracting SECTION 2.1 Solving Equations by Adding and Subtracting 2.1 OBJECTIVES 1. Determine whether a given number is a solution for an equation 2. Use the addition property to solve equations 3. Determine whether

More information

MAT Mathematics in Today's World

MAT Mathematics in Today's World MAT 1000 Mathematics in Today's World Last Time We discussed the four rules that govern probabilities: 1. Probabilities are numbers between 0 and 1 2. The probability an event does not occur is 1 minus

More information

Probability and Independence Terri Bittner, Ph.D.

Probability and Independence Terri Bittner, Ph.D. Probability and Independence Terri Bittner, Ph.D. The concept of independence is often confusing for students. This brief paper will cover the basics, and will explain the difference between independent

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

0. Introduction 1 0. INTRODUCTION

0. Introduction 1 0. INTRODUCTION 0. Introduction 1 0. INTRODUCTION In a very rough sketch we explain what algebraic geometry is about and what it can be used for. We stress the many correlations with other fields of research, such as

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Chapter 5 Simplifying Formulas and Solving Equations

Chapter 5 Simplifying Formulas and Solving Equations Chapter 5 Simplifying Formulas and Solving Equations Look at the geometry formula for Perimeter of a rectangle P = L W L W. Can this formula be written in a simpler way? If it is true, that we can simplify

More information

CS1800: Mathematical Induction. Professor Kevin Gold

CS1800: Mathematical Induction. Professor Kevin Gold CS1800: Mathematical Induction Professor Kevin Gold Induction: Used to Prove Patterns Just Keep Going For an algorithm, we may want to prove that it just keeps working, no matter how big the input size

More information

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 5. 1 Review (Pairwise Independence and Derandomization) 6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can

More information

Chapter 4: An Introduction to Probability and Statistics

Chapter 4: An Introduction to Probability and Statistics Chapter 4: An Introduction to Probability and Statistics 4. Probability The simplest kinds of probabilities to understand are reflected in everyday ideas like these: (i) if you toss a coin, the probability

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

MATH2206 Prob Stat/20.Jan Weekly Review 1-2 MATH2206 Prob Stat/20.Jan.2017 Weekly Review 1-2 This week I explained the idea behind the formula of the well-known statistic standard deviation so that it is clear now why it is a measure of dispersion

More information

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 Random Variables and Expectation Question: The homeworks of 20 students are collected in, randomly shuffled and returned to the students.

More information

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

MATH 3C: MIDTERM 1 REVIEW. 1. Counting MATH 3C: MIDTERM REVIEW JOE HUGHES. Counting. Imagine that a sports betting pool is run in the following way: there are 20 teams, 2 weeks, and each week you pick a team to win. However, you can t pick

More information

1 The Basic Counting Principles

1 The Basic Counting Principles 1 The Basic Counting Principles The Multiplication Rule If an operation consists of k steps and the first step can be performed in n 1 ways, the second step can be performed in n ways [regardless of how

More information

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers.

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers. 2 VECTORS, POINTS, and LINEAR ALGEBRA. At first glance, vectors seem to be very simple. It is easy enough to draw vector arrows, and the operations (vector addition, dot product, etc.) are also easy to

More information

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ). CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 8 Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According to clinical trials,

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

6.02 Fall 2011 Lecture #9

6.02 Fall 2011 Lecture #9 6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First

More information

1 / 9. Due Friday, May 11 th at 2:30PM. There is no checkpoint problem. CS103 Handout 30 Spring 2018 May 4, 2018 Problem Set 5

1 / 9. Due Friday, May 11 th at 2:30PM. There is no checkpoint problem. CS103 Handout 30 Spring 2018 May 4, 2018 Problem Set 5 1 / 9 CS103 Handout 30 Spring 2018 May 4, 2018 Problem Set 5 This problem set the last one purely on discrete mathematics is designed as a cumulative review of the topics we ve covered so far and a proving

More information

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect

More information

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc FPPA-Chapters 13,14 and parts of 16,17, and 18 STATISTICS 50 Richard A. Berk Spring, 1997 May 30, 1997 1 Thinking about Chance People talk about \chance" and \probability" all the time. There are many

More information

Probability and the Second Law of Thermodynamics

Probability and the Second Law of Thermodynamics Probability and the Second Law of Thermodynamics Stephen R. Addison January 24, 200 Introduction Over the next several class periods we will be reviewing the basic results of probability and relating probability

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Grades 7 & 8, Math Circles 10/11/12 October, Series & Polygonal Numbers

Grades 7 & 8, Math Circles 10/11/12 October, Series & Polygonal Numbers Faculty of Mathematics Waterloo, Ontario N2L G Centre for Education in Mathematics and Computing Introduction Grades 7 & 8, Math Circles 0//2 October, 207 Series & Polygonal Numbers Mathematicians are

More information

Section 2.7 Solving Linear Inequalities

Section 2.7 Solving Linear Inequalities Section.7 Solving Linear Inequalities Objectives In this section, you will learn to: To successfully complete this section, you need to understand: Add and multiply an inequality. Solving equations (.1,.,

More information

Quantum Entanglement. Chapter Introduction. 8.2 Entangled Two-Particle States

Quantum Entanglement. Chapter Introduction. 8.2 Entangled Two-Particle States Chapter 8 Quantum Entanglement 8.1 Introduction In our final chapter on quantum mechanics we introduce the concept of entanglement. This is a feature of two-particle states (or multi-particle states) in

More information

Lesson 11-1: Parabolas

Lesson 11-1: Parabolas Lesson -: Parabolas The first conic curve we will work with is the parabola. You may think that you ve never really used or encountered a parabola before. Think about it how many times have you been going

More information

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................

More information

(Refer Slide Time: 0:21)

(Refer Slide Time: 0:21) Theory of Computation Prof. Somenath Biswas Department of Computer Science and Engineering Indian Institute of Technology Kanpur Lecture 7 A generalisation of pumping lemma, Non-deterministic finite automata

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

Lecture 10: Powers of Matrices, Difference Equations

Lecture 10: Powers of Matrices, Difference Equations Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

CS 350 Algorithms and Complexity

CS 350 Algorithms and Complexity 1 CS 350 Algorithms and Complexity Fall 2015 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower

More information

Properties of Arithmetic

Properties of Arithmetic Excerpt from "Prealgebra" 205 AoPS Inc. 4 6 7 4 5 8 22 23 5 7 0 Arithmetic is being able to count up to twenty without taking o your shoes. Mickey Mouse CHAPTER Properties of Arithmetic. Why Start with

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Expected Value II. 1 The Expected Number of Events that Happen

Expected Value II. 1 The Expected Number of Events that Happen 6.042/18.062J Mathematics for Computer Science December 5, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Expected Value II 1 The Expected Number of Events that Happen Last week we concluded by showing

More information

CHAPTER 12 Boolean Algebra

CHAPTER 12 Boolean Algebra 318 Chapter 12 Boolean Algebra CHAPTER 12 Boolean Algebra SECTION 12.1 Boolean Functions 2. a) Since x 1 = x, the only solution is x = 0. b) Since 0 + 0 = 0 and 1 + 1 = 1, the only solution is x = 0. c)

More information

6.042/18.062J Mathematics for Computer Science November 28, 2006 Tom Leighton and Ronitt Rubinfeld. Random Variables

6.042/18.062J Mathematics for Computer Science November 28, 2006 Tom Leighton and Ronitt Rubinfeld. Random Variables 6.042/18.062J Mathematics for Computer Science November 28, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Variables We ve used probablity to model a variety of experiments, games, and tests.

More information

Modular numbers and Error Correcting Codes. Introduction. Modular Arithmetic.

Modular numbers and Error Correcting Codes. Introduction. Modular Arithmetic. Modular numbers and Error Correcting Codes Introduction Modular Arithmetic Finite fields n-space over a finite field Error correcting codes Exercises Introduction. Data transmission is not normally perfect;

More information

Chapter 2. A Look Back. 2.1 Substitution ciphers

Chapter 2. A Look Back. 2.1 Substitution ciphers Chapter 2 A Look Back In this chapter we take a quick look at some classical encryption techniques, illustrating their weakness and using these examples to initiate questions about how to define privacy.

More information

Information & Correlation

Information & Correlation Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

Algebra & Trig Review

Algebra & Trig Review Algebra & Trig Review 1 Algebra & Trig Review This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The

More information

Precalculus idea: A picture is worth 1,000 words

Precalculus idea: A picture is worth 1,000 words Six Pillars of Calculus by Lorenzo Sadun Calculus is generally viewed as a difficult subject, with hundreds of formulas to memorize and many applications to the real world. However, almost all of calculus

More information

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS The integers are the set 1. Groups, Rings, and Fields: Basic Examples Z := {..., 3, 2, 1, 0, 1, 2, 3,...}, and we can add, subtract, and multiply

More information

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples: Sept. 20-22 2005 1 CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall 2005 In this lecture we introduce some of the mathematical tools that are at the base of the technical approaches

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information