Mathematics of the Information Age
Material ages The Stone Age From - to about 4000BC The Bronze Age From 2300 BC to 500 BC The Iron Age From 800 BC to 100 AD
The Information Age Begins in 1948 with the work of Claude Shannon at Bell Labs
What do the codes used for sending messages back from spacecraft have in common with genes on a molecule of DNA? How is it that the second law of thermodynamics, a physicist s discovery, is related to communication? Why are the knotty problems in the mathematical theory of probability connected with the way we express ourselves in speech and communication? The answer to all of these questions is information Jeremy Campbell, Grammatical Man,1982
I shall argue that this information flow, not energy per se, is the prime mover of life that molecular information flowing in circles brings forth the organization we call organism and maintains it against the ever present disorganizing pressures in the physics universe. So viewed, the information circle becomes the unit of life. Werner Lowenstein, The Touchstone of Life, 2000
Aspects of Information?
Practical Perceptual Physical All have something to do with communication
Aspects of information the theory The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated to some system with certain physical or contextual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. Claude Shannon, The Mathematical Theory of Communication, 1948
Prior condition for communication to be possible: The sender and receiver both have to have the same set of all possible messages, or be able to construct it. They need the same codebook
The most famous codebook in history?
How do we measure information? (In Shannon s theory, Information becomes quantitative.)
Remember Shannon s quote: The significant aspect is that the actual message is one selected from a set of possible messages. How to quantify the process of selection?
Let s play 20 questions! I m thinking of a famous person. (But remember, we both know all the famous people.)
1. The person is Brad Osgood
1. The person is Brad Osgood 2. The person is Rebecca Osgood
1. The person is Brad Osgood 2. The person is Rebecca Osgood 3. The person is Miles Osgood 4. The person is Madeleine Osgood
1. The person is Brad Osgood 2. The person is Rebecca Osgood 3. The person is Miles Osgood 4. The person is Madeleine Osgood 5. The person is Ruth Osgood 6. The person is Herbert Osgood 7. The person is Lynn Osgood 8. The person is Alex Beasley 9. The person is Thomas Faxon 10. The person is Virginia Faxon 11. The person is Thomas Faxon, Jr. 12. The person is Meer Deiters 13. The person is Francisca Faxon 14. The person is Pia Faxon 15. The person is George W. Bush 16. The person is Saddam Hussein
Brad says: Who needs 20 questions. I bet I can pick out any object (in English) by asking 18 questions. OK, maybe 19. Hah! What is the basis for this bold claim? Is it justified? In the real version of 20 questions the sender says that object is animal, mineral or vegetable to allow the receiver to narrow down their questions. Just how many things can you determine by asking 20 questions?
2 18 = 261,144 2 19 = 524,288 The number of entries in the 1989 edition of the Oxford English Dictionary is 291,500 2 20 = 1,048,576
Impress your friends. I can pick any name out of the Stanford phone book in N questions
The unit of information is the bit How many bits how many yes-no questions are needed to select one particular message from a set of possible messages? The possible messages are encoded into sequences of bits. In practice, 0 s and 1 s (off, on; no, yes). Many coding schemes are possible, some more efficient or reliable than others. There are many ways to play 20 questions
General definition of amount of information Suppose there are N possible messages. The amount of information in any particular message is I = log 2 N (unit is bits) (Same thing as saying 2 I =N) What does it mean to say that the amount of information in a message is, e.g., 3.45 bits?
I m more famous than you are In any practical application not all messages are equally probable. How can we measure information taking probabilities into account?
1. The person is Brad Osgood 2. The person is Brad Osgood 3. The person is Brad Osgood 4. The person is Brad Osgood 5. The person is George W. Bush 6. The person is Saddam Hussein 7. The person is Colin Powell 8. The person is Condoleezza Rice Playing the game many times, how many questions do you think you d need, on average to pick out a particular message?
Is the person in the group 1 through 4? Yes. No One question resolves the uncertainty. Need two more questions, for a total of three. Brad Osgood occurs 4 out of 8 times: Probability 4/8=1/2 I( Brad Osgood ) = 1 Everybody else occurs 1 out of 8 times: Probability 1/8 I( George W. ) = 3
In general, if a message S occurs with probability p then I(S) = log 2 (1/p) If we have N messages (the source ) S 1, S 2,,S N occurring with probabilities p 1,p 2,,p N then the average information of the source as a whole (the entropy of the source ) is the weighted average of the information of the individual messages: H=p 1 log 2 (1/p 1 )+p 2 log 2 (1/p 2 )+ + p N log 2 (1/p N )
Can you improve your estimate on how many questions it should take to pick a name out of the Stanford phone book?
Shannon defined entropy as a measure of average information in a source (the collection of possible messages), taking probabilities into account. H=p 1 log 2 (1/p 1 )+p 2 log 2 (1/p 2 )+ + p N log 2 (1/p N ) And he proved:
Noiseless Source Coding Theorem: For any coding scheme the average length of a codeword is at least the entropy. This gives a lower bound to our cleverness
Shannon defined the capacity of a channel as a measure of how much information it could transmit. And he proved:
Channel Coding Theorem: A channel with capacity C is capable, with suitable coding, of transmitting at any rate less than C bits per symbol with vanishingly small probability of error. For rates greater than C the probability of error cannot be made arbitrarily small.
Most great physical and mathematical discoveries seem trivial after you understand them. You say to yourself: I could have done that. But as I hold the tattered journal containing Claude Shannon s classic 1948 paper A Mathematical Theory of Communication I see yellowed pages filled with vacuum tubes and mechanisms of yesteryear, and I know I could never have conceived the insightful theory of information shining through these glossy pages of archaic font. I know of no greater work of genius in the annals of technological thought. Robert W. Lucky, Silicon Dreams, 1989
The course syllabus
Analog Signal (e.g. Music, Speech, Images) A to D converte Digitized signal (0s and 1 s) Compression (e.g. MP3) Add error correction (e.g fixes scratches in CDs) Noise! The Channel (e.g. Fiber optics, the Internet, Computer memory) Correct errors (Remove redundancy) Uncompress D to A converte
It took awhile for the technology to catch up with Shannon s theory
The news from Troy In Agamemnon by Aeschylus The fall of Troy was signaled by a beacon. The play opens with a watchman who waited for 12 years for a single piece of news: the promised sign, the beacon flare to speak from Troy and utter one word, `Victory!'."
The news from Gondor
The news from Paris A message was spelled out, symbol by symbol, and relayed from one station to the next. Operators at intermediate stations were allowed to know only portions of the codebook. The full codebook, which had over 25,000 entries, was given only to the inspectors.
The network 1820 s 1850 s
High Tech of the mid 19 th Century 1824 Samuel F.B. Morse, an art instructor, learns about electromagnetism 1831 Joseph Henry demonstrates an electromagnetic telegraph with a one mile run in Albany, New York. 1837 Morse demonstrates his electric telegraph in New York 1837 Wheatstone and Cooke set up British electric telegraph. Transatlantic cables around 1904
The first shot in the second William Thomson (later Lord Kelvin, 1824 1907) On the theory of the electric telegraph, Proceedings of the Royal Society, 1855 industrial revolution Answered the question of why signals smear out over a long cable.
Communication became mathematical! Surely this must have been hailed as a breakthrough!
I believe nature knows no such application of this law and I can only regard it as a fiction of the schools; a forced and violent application of a principle in Physics, good and true under other circumstances, but misapplied here. Edward Whitehouse, chief electrician for the Atlantic Telegraph Company, speaking in 1856.
Right. The first transatlantic cable used Whitehouse s specifications, not Thomson s The continents were joined August 5, 1858 (after four previous failed attempts). The first successful message was sent August 16. The cable failed three weeks later. Whitehouse insisted on using high voltage, disregarding Thomson s analysis
The rise of electrical networks telegraph, telephone and beyond
Broadway & John Street, New York 1890
Gerard Exchange, London, 1926
What s wrong with this picture?
Wireless Guglielmo Marconi (1874 1937)
The last of the great data networks?
First need a mathematical description of signals What kinds of signals? Speech Music Images All can be described via Fourier analysis
Major Secret of the Universe Every signal has a spectrum