Mathematics of the Information Age

Similar documents
An introduction to basic information theory. Hampus Wessman

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

The Liar Game. Mark Wildon

A Mathematical Theory of Communication

(Classical) Information Theory III: Noisy channel coding

3F1 Information Theory, Lecture 1

log 2 N I m m log 2 N + 1 m.

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

UNIT I INFORMATION THEORY. I k log 2

Massachusetts Institute of Technology. Problem Set 4 Solutions

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Information in Biology

channel of communication noise Each codeword has length 2, and all digits are either 0 or 1. Such codes are called Binary Codes.

A Room-Sized Computer in Your Digital Music Player By ReadWorks

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

9 THEORY OF CODES. 9.0 Introduction. 9.1 Noise

Polynomial Codes over Certain Finite Fields

6.02 Fall 2012 Lecture #1

LMS Popular Lectures. Codes. Peter J. Cameron

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Entropies & Information Theory

CSCI 2570 Introduction to Nanocomputing

Information Theory (Information Theory by J. V. Stone, 2015)

MP203 Statistical and Thermal Physics. Jon-Ivar Skullerud and James Smith

Quantum Teleportation Pt. 1

Physical Systems. Chapter 11

D CUME N TAT ION C 0 R P 0 R A T E D 2,21 CONNECTICUT AVENUE, N. W,

Information in Biology

An Introduction to Information Theory: Notes

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Definition of geometric vectors

An Introduction to (Network) Coding Theory

The Laws of Thermodynamics and Information and Economics

Shannon s Noisy-Channel Coding Theorem

Shannon's Theory of Communication

Introduction to Algebra: The First Week

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

Please bring the task to your first physics lesson and hand it to the teacher.

ITCT Lecture IV.3: Markov Processes and Sources with Memory

to mere bit flips) may affect the transmission.

Int er net Saf et y Tip s

6.02 Fall 2011 Lecture #9

3F1 Information Theory, Lecture 3

Introducing Inspector Tippington

CHAPTER 7: TECHNIQUES OF INTEGRATION

3F1 Information Theory, Lecture 3

1. Study the following Vocabulary Words to be defined: Prehistory, History, Geography, 5 Themes of Geography, Legacy

Math 138: Introduction to solving systems of equations with matrices. The Concept of Balance for Systems of Equations

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Massachusetts Institute of Technology

We saw last time how the development of accurate clocks in the 18 th and 19 th centuries transformed human cultures over the world.

Lesson 32. The Grain of Wheat. John 12:20-26

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

CSE468 Information Conflict

Day 15. Tuesday June 12, 2012

Instructor (Brad Osgood)

The Transistor. Thomas J. Bergin Computer History Museum American University

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Entanglement and information

Lecture 1: Introduction, Entropy and ML estimation

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Good morning everyone, and welcome again to MSLs World Metrology Day celebrations.

Introduction to Information Theory. Part 4

Error Correcting Codes Prof. Dr. P. Vijay Kumar Department of Electrical Communication Engineering Indian Institute of Science, Bangalore

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

MITOCW ocw f99-lec30_300k

Knots, Coloring and Applications

Lecture 2: Perfect Secrecy and its Limitations

Algebraic Codes for Error Control

Astronomy 102 Math Review

Massachusetts Institute of Technology

Implicit Differentiation Applying Implicit Differentiation Applying Implicit Differentiation Page [1 of 5]

Experiment 9. Emission Spectra. measure the emission spectrum of a source of light using the digital spectrometer.

Noisy channel communication

Simple Interactions CS 105 Lecture 2 Jan 26. Matthew Stone

Using Microsoft Excel

Sample. Contents SECTION 1: PLACE NAMES 6 SECTION 2: CONNECTING TO PLACES 21 SECTION 3: SPACES: NEAR AND FAR 53

Understanding The Law of Attraction

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Information Theory, Statistics, and Decision Trees

Ask. Don t Tell. Annotated Examples

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Dept. of Linguistics, Indiana University Fall 2015

To Infinity and Beyond

DEPARTMENT OF EECS MASSACHUSETTS INSTITUTE OF TECHNOLOGY. 6.02: Digital Communication Systems, Fall Quiz I. October 11, 2012

Measurement Error PHYS Introduction

Basic methods to solve equations

Direct Proof and Counterexample I:Introduction

Take the measurement of a person's height as an example. Assuming that her height has been determined to be 5' 8", how accurate is our result?

MITOCW ocw f99-lec01_300k

Major upsetting discoveries: Today s Objectives/Agenda. Notice: New Unit: with Ms. V. after school Before Friday 9/22.

Introduction to Informatics

Direct Proof and Counterexample I:Introduction. Copyright Cengage Learning. All rights reserved.

X 1 : X Table 1: Y = X X 2

To Infinity and Beyond. To Infinity and Beyond 1/43

Pig organ transplants within 5 years

Boolean Algebra & Digital Logic

Transcription:

Mathematics of the Information Age

Material ages The Stone Age From - to about 4000BC The Bronze Age From 2300 BC to 500 BC The Iron Age From 800 BC to 100 AD

The Information Age Begins in 1948 with the work of Claude Shannon at Bell Labs

What do the codes used for sending messages back from spacecraft have in common with genes on a molecule of DNA? How is it that the second law of thermodynamics, a physicist s discovery, is related to communication? Why are the knotty problems in the mathematical theory of probability connected with the way we express ourselves in speech and communication? The answer to all of these questions is information Jeremy Campbell, Grammatical Man,1982

I shall argue that this information flow, not energy per se, is the prime mover of life that molecular information flowing in circles brings forth the organization we call organism and maintains it against the ever present disorganizing pressures in the physics universe. So viewed, the information circle becomes the unit of life. Werner Lowenstein, The Touchstone of Life, 2000

Aspects of Information?

Practical Perceptual Physical All have something to do with communication

Aspects of information the theory The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated to some system with certain physical or contextual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. Claude Shannon, The Mathematical Theory of Communication, 1948

Prior condition for communication to be possible: The sender and receiver both have to have the same set of all possible messages, or be able to construct it. They need the same codebook

The most famous codebook in history?

How do we measure information? (In Shannon s theory, Information becomes quantitative.)

Remember Shannon s quote: The significant aspect is that the actual message is one selected from a set of possible messages. How to quantify the process of selection?

Let s play 20 questions! I m thinking of a famous person. (But remember, we both know all the famous people.)

1. The person is Brad Osgood

1. The person is Brad Osgood 2. The person is Rebecca Osgood

1. The person is Brad Osgood 2. The person is Rebecca Osgood 3. The person is Miles Osgood 4. The person is Madeleine Osgood

1. The person is Brad Osgood 2. The person is Rebecca Osgood 3. The person is Miles Osgood 4. The person is Madeleine Osgood 5. The person is Ruth Osgood 6. The person is Herbert Osgood 7. The person is Lynn Osgood 8. The person is Alex Beasley 9. The person is Thomas Faxon 10. The person is Virginia Faxon 11. The person is Thomas Faxon, Jr. 12. The person is Meer Deiters 13. The person is Francisca Faxon 14. The person is Pia Faxon 15. The person is George W. Bush 16. The person is Saddam Hussein

Brad says: Who needs 20 questions. I bet I can pick out any object (in English) by asking 18 questions. OK, maybe 19. Hah! What is the basis for this bold claim? Is it justified? In the real version of 20 questions the sender says that object is animal, mineral or vegetable to allow the receiver to narrow down their questions. Just how many things can you determine by asking 20 questions?

2 18 = 261,144 2 19 = 524,288 The number of entries in the 1989 edition of the Oxford English Dictionary is 291,500 2 20 = 1,048,576

Impress your friends. I can pick any name out of the Stanford phone book in N questions

The unit of information is the bit How many bits how many yes-no questions are needed to select one particular message from a set of possible messages? The possible messages are encoded into sequences of bits. In practice, 0 s and 1 s (off, on; no, yes). Many coding schemes are possible, some more efficient or reliable than others. There are many ways to play 20 questions

General definition of amount of information Suppose there are N possible messages. The amount of information in any particular message is I = log 2 N (unit is bits) (Same thing as saying 2 I =N) What does it mean to say that the amount of information in a message is, e.g., 3.45 bits?

I m more famous than you are In any practical application not all messages are equally probable. How can we measure information taking probabilities into account?

1. The person is Brad Osgood 2. The person is Brad Osgood 3. The person is Brad Osgood 4. The person is Brad Osgood 5. The person is George W. Bush 6. The person is Saddam Hussein 7. The person is Colin Powell 8. The person is Condoleezza Rice Playing the game many times, how many questions do you think you d need, on average to pick out a particular message?

Is the person in the group 1 through 4? Yes. No One question resolves the uncertainty. Need two more questions, for a total of three. Brad Osgood occurs 4 out of 8 times: Probability 4/8=1/2 I( Brad Osgood ) = 1 Everybody else occurs 1 out of 8 times: Probability 1/8 I( George W. ) = 3

In general, if a message S occurs with probability p then I(S) = log 2 (1/p) If we have N messages (the source ) S 1, S 2,,S N occurring with probabilities p 1,p 2,,p N then the average information of the source as a whole (the entropy of the source ) is the weighted average of the information of the individual messages: H=p 1 log 2 (1/p 1 )+p 2 log 2 (1/p 2 )+ + p N log 2 (1/p N )

Can you improve your estimate on how many questions it should take to pick a name out of the Stanford phone book?

Shannon defined entropy as a measure of average information in a source (the collection of possible messages), taking probabilities into account. H=p 1 log 2 (1/p 1 )+p 2 log 2 (1/p 2 )+ + p N log 2 (1/p N ) And he proved:

Noiseless Source Coding Theorem: For any coding scheme the average length of a codeword is at least the entropy. This gives a lower bound to our cleverness

Shannon defined the capacity of a channel as a measure of how much information it could transmit. And he proved:

Channel Coding Theorem: A channel with capacity C is capable, with suitable coding, of transmitting at any rate less than C bits per symbol with vanishingly small probability of error. For rates greater than C the probability of error cannot be made arbitrarily small.

Most great physical and mathematical discoveries seem trivial after you understand them. You say to yourself: I could have done that. But as I hold the tattered journal containing Claude Shannon s classic 1948 paper A Mathematical Theory of Communication I see yellowed pages filled with vacuum tubes and mechanisms of yesteryear, and I know I could never have conceived the insightful theory of information shining through these glossy pages of archaic font. I know of no greater work of genius in the annals of technological thought. Robert W. Lucky, Silicon Dreams, 1989

The course syllabus

Analog Signal (e.g. Music, Speech, Images) A to D converte Digitized signal (0s and 1 s) Compression (e.g. MP3) Add error correction (e.g fixes scratches in CDs) Noise! The Channel (e.g. Fiber optics, the Internet, Computer memory) Correct errors (Remove redundancy) Uncompress D to A converte

It took awhile for the technology to catch up with Shannon s theory

The news from Troy In Agamemnon by Aeschylus The fall of Troy was signaled by a beacon. The play opens with a watchman who waited for 12 years for a single piece of news: the promised sign, the beacon flare to speak from Troy and utter one word, `Victory!'."

The news from Gondor

The news from Paris A message was spelled out, symbol by symbol, and relayed from one station to the next. Operators at intermediate stations were allowed to know only portions of the codebook. The full codebook, which had over 25,000 entries, was given only to the inspectors.

The network 1820 s 1850 s

High Tech of the mid 19 th Century 1824 Samuel F.B. Morse, an art instructor, learns about electromagnetism 1831 Joseph Henry demonstrates an electromagnetic telegraph with a one mile run in Albany, New York. 1837 Morse demonstrates his electric telegraph in New York 1837 Wheatstone and Cooke set up British electric telegraph. Transatlantic cables around 1904

The first shot in the second William Thomson (later Lord Kelvin, 1824 1907) On the theory of the electric telegraph, Proceedings of the Royal Society, 1855 industrial revolution Answered the question of why signals smear out over a long cable.

Communication became mathematical! Surely this must have been hailed as a breakthrough!

I believe nature knows no such application of this law and I can only regard it as a fiction of the schools; a forced and violent application of a principle in Physics, good and true under other circumstances, but misapplied here. Edward Whitehouse, chief electrician for the Atlantic Telegraph Company, speaking in 1856.

Right. The first transatlantic cable used Whitehouse s specifications, not Thomson s The continents were joined August 5, 1858 (after four previous failed attempts). The first successful message was sent August 16. The cable failed three weeks later. Whitehouse insisted on using high voltage, disregarding Thomson s analysis

The rise of electrical networks telegraph, telephone and beyond

Broadway & John Street, New York 1890

Gerard Exchange, London, 1926

What s wrong with this picture?

Wireless Guglielmo Marconi (1874 1937)

The last of the great data networks?

First need a mathematical description of signals What kinds of signals? Speech Music Images All can be described via Fourier analysis

Major Secret of the Universe Every signal has a spectrum