Shannon's Theory of Communication

Similar documents
1. Basics of Information

6.02 Fall 2012 Lecture #1

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Shannon s noisy-channel theorem

Entropies & Information Theory

Lecture 8: Shannon s Noise Models

CSCI 2570 Introduction to Nanocomputing

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

UNIT I INFORMATION THEORY. I k log 2

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Information Theory (Information Theory by J. V. Stone, 2015)

Lecture 1: Introduction, Entropy and ML estimation

Noisy channel communication

Intro to Information Theory

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

A Simple Introduction to Information, Channel Capacity and Entropy


Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

6.3 Bernoulli Trials Example Consider the following random experiments

6.02 Fall 2011 Lecture #9

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Entropy as a measure of surprise

Chapter 9 Fundamental Limits in Information Theory

Exercise 1. = P(y a 1)P(a 1 )

An introduction to basic information theory. Hampus Wessman

4 An Introduction to Channel Coding and Decoding over BSC

Chapter 2: Source coding

Lecture 1: Shannon s Theorem

Introduction to Information Theory. Part 4

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

3F1 Information Theory, Lecture 3

Topics. Probability Theory. Perfect Secrecy. Information Theory

channel of communication noise Each codeword has length 2, and all digits are either 0 or 1. Such codes are called Binary Codes.

log 2 N I m m log 2 N + 1 m.

Introduction to Information Theory

Lecture 1. Introduction

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

ELECTRONICS & COMMUNICATIONS DIGITAL COMMUNICATIONS

Information Theory - Entropy. Figure 3

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Information in Biology

3F1 Information Theory, Lecture 3

One Lesson of Information Theory

Lecture 2: August 31

A Mathematical Theory of Communication

18.310A Final exam practice questions

Digital communication system. Shannon s separation principle

Information in Biology

Communication Theory II

Multimedia Systems WS 2010/2011

IOSR Journal of Mathematics (IOSR-JM) e-issn: , p-issn: x. Volume 9, Issue 2 (Nov. Dec. 2013), PP

(Classical) Information Theory III: Noisy channel coding

3F1 Information Theory, Lecture 1

Channel Coding I. Exercises SS 2017

ECE Information theory Final (Fall 2008)

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Physical Layer and Coding

Capacity of a channel Shannon s second theorem. Information Theory 1/33

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Some Basic Concepts of Probability and Information Theory: Pt. 2

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

List Decoding of Reed Solomon Codes

Quantitative Biology Lecture 3

Some Concepts in Probability and Information Theory

Information Theory, Statistics, and Decision Trees

Information and Entropy. Professor Kevin Gold

Noisy-Channel Coding

Information & Correlation

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Classification & Information Theory Lecture #8

Exercises with solutions (Set B)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

exercise in the previous class (1)

Roll No. :... Invigilator's Signature :.. CS/B.TECH(ECE)/SEM-7/EC-703/ CODING & INFORMATION THEORY. Time Allotted : 3 Hours Full Marks : 70

INTRODUCTION TO INFORMATION THEORY

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

2. Polynomials. 19 points. 3/3/3/3/3/4 Clearly indicate your correctly formatted answer: this is what is to be graded. No need to justify!

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Massachusetts Institute of Technology

Principles of Communications

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

PCM Reference Chapter 12.1, Communication Systems, Carlson. PCM.1

Context: Theory of variable privacy. The channel coding theorem and the security of binary randomization. Poorvi Vora Hewlett-Packard Co.

Communications Theory and Engineering

ELEMENT OF INFORMATION THEORY

ECS 332: Principles of Communications 2012/1. HW 4 Due: Sep 7

Transcription:

Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam

Fundamental basis of any communication

Fundamental basis of any communication Fundamental basis of communication to reproduce at one point a message selected at another point.

Fundamental basis of any communication Fundamental basis of communication We focus on the transmission system: the communication channel.

Problem: how much information can we transmit reliably?

Quantification

Quantify information What information are we talking about? That concerning the signal, not the interpreted meaning of the signal.

Quantify information What information are we talking about? That concerning the signal, not the interpreted meaning of the signal. Question: how much information the next pages contain, taken as separate sources?

Quantify information A datum is reducible to a lack of uniformity. differentiation in the signal produces data. How much can we differenciate the signal? It depends on how the signal is constructed!

?

Quantification - 2

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands?

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16.

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. Dichotomy method: Is it greater than 8?

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. Dichotomy method: Is it greater than 8? No. 8 cards remaining

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. Dichotomy method: Is it greater than 8? No. Is it greater then 4? Yes. 8 cards remaining 4 cards remaining

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. Dichotomy method: Is it greater than 8? No. Is it greater then 4? Yes. Is it greater then 6? No. 8 cards remaining 4 cards remaining 2 cards remaining

Exercise I choose randomly a card from a deck of 16 cards, ordered from 1 to 16. How many yes/no questions should you ask to discover which card I have in my hands? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. Dichotomy method: Is it greater than 8? No. Is it greater then 4? Yes. Is it greater then 6? No. Is it greater then 5? No. 8 cards remaining 4 cards remaining 2 cards remaining 1 cards remaining

Unit of measuring information The cards correspond to the indexed symbols available at the source. BIT, for binary digit, corresponds to one answer to YES/NO questions

Unit of measuring information The cards correspond to the indexed symbols available at the source. BIT, for binary digit, corresponds to one answer to YES/NO questions Basic formulas: # questions = Log 2 (# symbols) # symbols = 2 #questions

Trick for Log 2 without calc Write the series of multiples of 2 2. 4. 8. 16. 32. 64. 128. 256. 512. 1024... 1. 2. 3. 4. 5. 6. 7. 8. 9. 10... From this sequence we read that: 2 7 = 128 Log 2 32 = 5 2-4 = 1/16 Log 2 1/16 = -4

Exercise How many symbols do we have in this class? as individuals as genders How many symbols in this question? as letters as words Calculate how many bits we need to index them.

Exercise How many symbols do we have in this class? as individuals as genders How many symbols in this question? as letters as words NOTA BENE: the answers depend on the referent and on the filter chosen!

From binary to decimal code.. 8 4 2 1.. 2 3 2 2 2 1 2 0 0 0 0 0 = 0 0 1 0 0 = 4 0 1 0 1 = 4 + 1 = 5 1 1 1 1 = 8 + 4 + 2 + 1 = 15

From binary to decimal code.. 8 4 2 1.. 2 3 2 2 2 1 2 0 0 0 0 0 = 0 0 1 0 0 = 4 0 1 0 1 = 4 + 1 = 5 1 1 1 1 = 8 + 4 + 2 + 1 = 15 N bits index 2N integers, from 0 to 2 N -1

From binary to decimal code.. 8 4 2 1.. 2 3 2 2 2 1 2 0 0 0 0 0 = 0 0 1 0 0 = 4 0 1 0 1 = 4 + 1 = 5 1 1 1 1 = 8 + 4 + 2 + 1 = 15 N bits index 2N integers, from 0 to 2 N -1 Exercise: write 9 with 4 bits. write 01010 in decimal. how many bits we need to write 5632?

Encoding

Idea: instead of transmitting the original analog signal...

encoding decoding...what if we transmit the correspondent digital signal?

encoding decoding..as we are doing this trasformation, we can choose if there is an encoding more efficient than others!

Information For instance, we could use the statistical information we know about the source.

Information For instance, we could use the statistical information we know about the source. Intuitively common symbols transport less information rare symbols transport more information It can be interpreted as the surprise, the unexpectedness of x.

Information For instance, we could use the statistical information we know about the source. Intuitively common symbols transport less information rare symbols transport more information common/rare to WHO?

Information Suppose each symbol x is extracted from the source with a certain probability p We define the information associated to the symbol x, in respect to the source: I(x) = - Log 2 (p)

Information Suppose each symbol x is extracted from the source with a certain probability p We define the information associated to the symbol x, in respect to the source: I(x) = - Log 2 (p) Unit of measure: bit

How to draw the Log I I = -log2(p) 0 p 1 1 0 0.5 1 p Ex. fair coin: 2 symbols (head, tail) p(head) = 1/2, I(head) = 1 bit

Multiple extractions Assuming the symbols are statistically independent, we can calculate the probability of multiple extractions in this way: p(x AND y) = p(x) * p(y) I(x AND y) = I(x) + I(y)

Entropy Entropy is the average quantity of information of the source H = p(x) * I(x) + p(y) * I(y) = - p(x) * Log 2 (p(x)) - p(y) * Log 2 (p(y)) It can be interpreted as the average missing information (required to specify an outcome x when we now the source probability distribution). Unit of measure: bit/symbol

Entropy Entropy is the average quantity of information of the source H = p(x) * I(x) + p(y) * I(y) = - p(x) * Log 2 (p(x)) - p(y) * Log 2 (p(y)) It depends on the alphabet of symbols of the source and the associated probability distribution. Ex. fair coin: H = 0.5 * 1 + 0.5 * 1 = 1

Maximum Entropy The maximum entropy of a source of N symbols is: max H = Log 2 (N) It is obtained only when all symbols have the same probability.

Maximum Entropy and Redundancy The maximum entropy of a source of N symbols is: max H = Log 2 (N) It is obtained only when all symbols have the same probability. A measure for redundancy: Redundancy = (max H - actual H)/max H

Exercise Calculate the information per symbol for a source with an alphabet of statistically independent symbols, with equal probability, consisting of 1 symbol. consisting of 2 symbols. consisting of 16 symbols. How much the entropy?

Exercise Calculate the entropy of a source with an alphabet consisting of 4 symbols with these probability: p 1 = 1/2 p 2 = 1/4 p 3 = p 4 = 1/8 What is the entropy if the symbols are equally probable? What is the redundancy?

Compression

Source encoding In order to reduce the resource requirements (bandwith, power, etc.) we are interested to minimize the average length of the words of the code.

Goal: reduce the use of signal encoding decoding

Goal: reduce the use of signal encoding decoding basic idea: associate short codes to more frequent symbols

Goal: reduce the use of signal encoding decoding basic idea: associate short codes to more frequent symbols we are operating a compression on the messages!

Huffman algorithm produces an optimal encoding (as average length). 3-steps: order symbols according to their probability group by two the least probable symbols, and sum up their probabilities associated to a new equivalent compound symbol repeat until you obtain only one equivalent compound symbol with probability 1

Example: Huffman algorithm Source counting 5 symbols with probability 1/3, 1/4, 1/6, 1/6, 1/12.

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 E] 1/12

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 E] 1/12

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 E] 1/12 0 1 1/4

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 0 E] 1/12 1 1/4

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 0 E] 1/12 1 1/4 0 1 5/12

Example: Huffman algorithm A] 1/3 = 4/12 B] 1/4 = 3/12 C] 1/6 D] 1/6 0 E] 1/12 1 1/4 0 1 5/12

Example: Huffman algorithm A] 1/3 = 4/12 B] 1/4 = 3/12 C] 1/6 0 1 7/12 D] 1/6 0 E] 1/12 1 1/4 0 1 5/12

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 0 E] 1/12 1 1/4 0 1 0 1 7/12 5/12 0 1 1

Example: Huffman algorithm A] 1/3 B] 1/4 C] 1/6 D] 1/6 0 E] 1/12 1 1/4 0 1 0 1 7/12 5/12 0 1 Reading codes from the root! 1

Example: Huffman algorithm 00 01 10 110 111 A] 1/3 B] 1/4 C] 1/6 D] 1/6 0 E] 1/12 1 1/4 0 1 0 1 7/12 5/12 0 1 Reading codes from the root! 1

Source Coding Theorem Given a source with entropy H, it is always possible to find an encoding which satisfies: H average code length < H + 1

Source Coding Theorem Given a source with entropy H, it is always possible to find an encoding which satisfies: H average code length < H + 1 In the previous exercise: H = - 1/3 * Log 2 (1/3) - 1/4 * Log 2 (1/4) - ACL = 2 * 1/3 + 2 * 1/4 +.. + 3 * 1/12 H = 2.19, ACL = 2.25

Exercise Propose an encoding for a communication system associated to a sensor placed in a rainforest. The sensor recognizes the warbles/tweets of birds from several species..

toucan

parrot

hornbill

eagle

Exercise Propose an encoding for a communication system associated to a sensor placed in a rainforest. The sensor recognizes the warbles/tweets of birds from several species, whose presence is described by these statistics: p(toucan) = 1/3 p(parrot) = 1/2 p(eagle) = 1/24 p(hornbill) = 1/8 Which of the assumptions you have used may be critical in this scenario?

Noise

entropy redundancy encoding, compression

Type of noise Noise can be seen as an unintented source which interferes with the intented one.

Type of noise Noise can be seen as an unintented source which interferes with the intented one. In terms of the outcomes, the communication channel may suffer of two types of interferences: data received but unwanted

Type of noise Noise can be seen as an unintented source which interferes with the intented one. In terms of the outcomes, the communication channel may suffer of two types of interferences: data received but unwanted data sent never received

Binary Simmetric Channel A binary symmetric channel (BSC) models the case that a binary input is flipped before the output. 0 Input 1 - p e p e 0 Output p e = error probability 1 - p e = probability of correct transmission p e 1 1 1 - p e

Binary Simmetric Channel Probability of transmissions on 1 bit Input Output 0 0 1 p e 0 1 p e Probability of transmissions on 2 bit Input Output 0 0 0 0 ( 1 p e ) * ( 1 p e ) 0 0 0 1 ( 1 p e ) * p e 0 0 1 0 p e * ( 1 p e ) 0 0 1 1 p e * p e

Exercise Consider messages of 3 bits, what is the probability of 2 bits inversion? what is the probability of error?

Error detection

Simple error detection Parity check A parity bit is added at the end of a of a string of bits (eg. 7): 0 if the number of 1 is even, 1 if odd Coding : 0000000 00000000 1001001 10010011 0111111 01111110

Example of error detection Parity check A parity bit is added at the end of a of a string of bits (eg. 7): 0 if the number of 1 is even, 1 if odd Decoding while detecting errors 01111110 ok 00100000 error detected 10111011 error not detected!

Exercise Add the parity bit Perform the parity check 01100011010? 1110001010111 01011100? 0001110011 0010010001? 1001110100 1111011100100? 11011

Exercise Consider messages of 2 bits + 1 parity bit. What is the probability to detect the error?

Error correction

Simple error correction Forward Error Correction with (3, 1) repetition, each bit is repeated two times more. Coding : 0 000 1 111 11 111111 010 000111000

Simple error correction Forward Error Correction with (3, 1) repetition, each bit is repeated two times more. Decoding (while correcting errors) 010 0 011 1 111101 11 100011000 010

Exercise Decode and identify the errors on the following encoding: 011000110101 010111001000 001001000011 111011001001

Summary

entropy redundancy channel capacity decoding, error detection, error correction encoding, compression

Main points - Entropy In Information Science, Entropy is a measure of the uncertainty at the reception point of messages generated by a source. Greater entropy, greater signal randomness Less entropy, more redundancy.

Main points - Entropy In Information Science, Entropy is a measure of the uncertainty at the reception point of messages generated by a source. Greater entropy, greater signal randomness Less entropy, more redundancy. It depends on what counts as symbol and their probability distributions, which are always taken by an observer.

Side comment - Entropy In Physics, Entropy is a function related to the amount of disorder. It always increases (even if locally may decrease).

Side comment - Entropy In Physics, Entropy is a function related to the amount of disorder. It always increases (even if locally may decrease).

Main points Redundancy & Noise As all communications suffer to a certain extent from noise, adding some redundancy is good for transmission, as it helps in detecting or even correcting certain errors.

Main points Redundancy & Noise As all communications suffer to a certain extent from noise, adding some redundancy is good for transmission, as it helps in detecting or even correcting certain errors. Example: (Some People Can Read This) Somr Peoplt Cat Rea Tis

Main points Redundancy & Noise As all communications suffer to a certain extent from noise, adding some redundancy is good for transmission, as it helps in detecting or even correcting certain errors. Example: (SM PPL CN RD THS) SMR PPLT CT R TS

Main points Redundancy & Noise As all communications suffer to a certain extent from noise, adding some redundancy is good for transmission, as it helps in detecting or even correcting certain errors. Example: Some People Can Read This Somr Peoplt Cat Rea Tis SM PPL CN RD THS SMR PPLT CT R TS The redundancy is expressed by the correlation between letters composing words! (independent probability assumption not valid!)

Literature Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379 423/623 656. Weaver, W. (1949). Recent contributions to the mathematical theory of communication. The mathematical theory of communication. Floridi, L. (2009). Philosophical conceptions of information. Formal Theories of Information, (2), 13 53. Guizzo, E. M. (2003). The essential message: Claude Shannon and the making of information theory. Massachusetts Institute of Technology. Lesne, A. (2014). Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3).