Lecture 1: Shannon s Theorem

Similar documents
UNIT I INFORMATION THEORY. I k log 2

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Chapter 2: Source coding

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Information and Entropy. Professor Kevin Gold

An introduction to basic information theory. Hampus Wessman

A Mathematical Theory of Communication

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

6.02 Fall 2012 Lecture #1

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

Entropy as a measure of surprise

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 11: Quantum Information III - Source Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Shannon-Fano-Elias coding

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

1. Basics of Information

CSCI 2570 Introduction to Nanocomputing

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Chapter 9 Fundamental Limits in Information Theory

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Lecture 16 Oct 21, 2014

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Digital communication system. Shannon s separation principle

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

3F1 Information Theory, Lecture 3

Entropies & Information Theory

EE5585 Data Compression May 2, Lecture 27

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

DCSP-3: Minimal Length Coding. Jianfeng Feng

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

3F1 Information Theory, Lecture 3

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.

Massachusetts Institute of Technology

Intro to Information Theory

to mere bit flips) may affect the transmission.

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Information and Entropy

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

1 Introduction to information theory

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Image and Multidimensional Signal Processing

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

CprE 281: Digital Logic

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Lecture 1 : Data Compression and Entropy

Chapter 5: Data Compression

COMM901 Source Coding and Compression. Quiz 1

Notes 10: Public-key cryptography

Information & Correlation

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 11. Error Correcting Codes Erasure Errors

Other Topics in Quantum Information

Lecture 10 : Basic Compression Algorithms

Kotebe Metropolitan University Department of Computer Science and Technology Multimedia (CoSc 4151)

6.02 Fall 2011 Lecture #9

Compression and Coding

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Exercise 1. = P(y a 1)P(a 1 )

Coding of memoryless sources 1/35

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 6

2. Polynomials. 19 points. 3/3/3/3/3/4 Clearly indicate your correctly formatted answer: this is what is to be graded. No need to justify!

Coding for Discrete Source

Modern Digital Communication Techniques Prof. Suvra Sekhar Das G. S. Sanyal School of Telecommunication Indian Institute of Technology, Kharagpur

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Massachusetts Institute of Technology. Final Exam Solutions

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Quantum Information & Quantum Computation

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

U Logo Use Guidelines

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

Quadratic Equations Part I

Information in Biology

Shannon's Theory of Communication

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Stream Codes. 6.1 The guessing game

Information in Biology

Quantum Teleportation Pt. 1

9 THEORY OF CODES. 9.0 Introduction. 9.1 Noise

Introduction to Cryptography Lecture 13

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

CISC 876: Kolmogorov Complexity

Source Coding Techniques

Quantum Gates, Circuits & Teleportation

Transcription:

Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015

Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work all the administrative details out later. Your mark will be based on participation in classes and exercise sessions and on a final exam. We ll let you know the exact marking scheme later. Any questions before we get started?

Data compression has been around for a long time, but it only got a theoretical foundation when Shannon introduced information theory in 1948. Proving things about information requires a precise, objective definition of what it is and how to measure it. To sidestep philosophical debates, Shannon considered the following situation:

Suppose Alice and Bob know the probability distribution according to which a random variable X takes on values according to a probability distribution P = p 1,..., p n. Alice learns X s value and tells Bob. How much information does she convey in the expected case?

Shannon posited three axioms: the expected amount of information conveyed should be a continuous function of the probabilities; if all possible values are equally likely, then the expected amount of information conveyed is monotonically increasing with how many there are; if X is the combination of two random variables Y and Z, then the expected amount information conveyed about X is the expected amount of information conveyed about Y plus the expected amount of information conveyed about Z.

For example, suppose X takes on the values from 0 to 99, Y is the value of X s first digit and Z is the value of X s second digit. When Alice tells Bob X s value, the expected amount of information conveyed about X is the expected amount of information conveyed about Y plus the expected amount of information conveyed about Z.

Shannon showed that the only function that satisfies his axioms is H(P) = i p i log 1 p i. This is called the entropy of P (or of X, according to some people). The base of the logarithm determines the units in which we measure information. Electrical engineers sometimes use ln and work in units called nats. Computer scientists use lg = log 2 and work in bits. The quantity lg(1/p) is sometimes called the self-information of an event with probability p.

Bit is short for binary digit and 1 bit is the amount of information conveyed when we learn the outcome of flipping a fair coin (because (1/2) lg(1/(1/2)) + (1/2) lg(1/(1/2)) = 1). Unfortunately, bit has (at least) two meanings in computer science: Alice sent 10 bits [symbols transmitted] to send Bob 3 bits [units of information].

More importantly (for us), Shannon showed that the minimum expected message length Alice can achieve with any binary prefix-free code is in [H(P), H(P) + 1). We consider prefix-free codes because Alice can t relinquish control of the channel to the next transmitter until appending more bits can t change how Bob will interpret her message.

First we ll prove the upper bound. We can assume without loss of generality that p 1 p n > 0. Building a binary prefix-free code with expected message length l is equivalent to building a binary tree on n leaves at depths d 1,..., d n such that i p id i = l.

Consider the binary representations of the partial sums 0, p 1, p 1 + p 2,..., p 1 + + p n 1. Since the ith partial sum differs from all the other by at least p i, the ith binary representation differs from all the others on at least one of its first lg(1/p i ) bits (to the right of the point).

To see why, notice that if two binary fractions agree on their first b bits, then they differ by strictly less than 2 b. Therefore, the ith binary representation differs agrees with each other representation on fewer than lg(1/p i ) of its first bits.

All this means we can build a binary prefix-free code with expected message length p i lg 1 < p i lg 1 + 1 = H(P) + 1. p i p i i i Notice we achieve expected message length H(P) if each p i is an integer power of 2.

Now we ll prove the lower bound. For any binary prefix-free code with which we can encode X s possible values, consider the corresponding binary tree on n leaves. Without loss of generality, we can assume this tree is strictly binary. Let d 1,..., d n be the depths of the leaves in order by their corresponding values, and let Q = q 1,..., q n = 1/2 d 1,..., 1/2 dn.

The amount by which the expected message length of this code exceeds H(P) is i p i d i H(P) = 1 ln 2 i p i ln q i p i. Since ln x x 1 for x > 0 with equality if and only if x = 1, pi ln q i ( ) qi p i 1 = q i p i = 0 p i p i i i i i with equality if and only if Q = P.

Therefore, the expected message length exceeds H(P) unless Q = P, in which case it equals H(P). Notice we can have Q = 1/2 d 1,..., 1/2 dn = P only when each p i is an integer power of 2. That is, this condition is both necessary and sufficient for us to achieve the entropy (with a binary code).

Theorem (Shannon, 1948) Suppose Alice and Bob know a random variable X takes on values according to a probability distribution P = p 1,..., p n. If Alice learns the value of X and tells Bob using a binary prefix-free code, then the minimum expected length of her message is at least H(P) and less than H(P) + 1, where H(P) = i p i lg(1/p i ) is the entropy of P.

Although Shannon is known as the father of information theory, it wasn t his only important contribution to electrical engineering and computer science. He was also the first to propose modelling circuits with Boolean logic, in his masters thesis. On Thursday we ll see the result of another masters thesis: Huffman s algorithm for constructing a prefix-free code with minimum expected message length.

It s important to note that Shannon and Huffman considered X to be everything Alice wants to tell Bob. If Alice is sending, say, a 10-page report, then it s unlikely she and Bob have agreed on a distribution over all such reports. Shannon or Huffman coding can still be applied, e.g., by pretending that each character of the report is chosen independently according to a fixed distribution, then encoding each character according to a code chosen for that distribution. (Alice can start by sending Bob the distribution of characters in the report together with its length.)

We now have the option of using codes that are not prefix-free, as long as they are still uniquely decodable. A code is uniquely decodable if no two strings have same encoding when we apply the code to each of their characters. For example, reversing each codeword in a prefix-free code produces a code which is is still uniquely decodable, but may no longer be prefix-free. Fortunately, the following two results imply that we have nothing to lose by continuing to consider only prefix-free codes:

Theorem (Kraft, 1949) There exists a prefix-free code with codeword lengths d 1,..., d n if and only if 1 i 1. 2 d i Theorem (McMillan,????) There exists a uniquely decodable code with codeword lengths d 1,..., d n if and only if 1 i 1. 2 d i

Since the characters in a normal news report aren t chosen independently and according to a fixed distribution, Shannon s lower bound doesn t hold. In fact, even pretending they are, we can still avoid using nearly a whole extra bit per character using a technique called arithmetic coding (invented by a Finn!). This is why you should be very careful of saying data compression schemes are optimal.

Shannon s and Huffman s results concern lossless compression. We ll avoid discussing lossy compression in this course because in order to do so, first we should agree on a loss function, which can get messy. For example, choosing the right loss function for compressing sound files is more a question of psycho-acoustics than of mathematics.

Exercise: Modify Shannon s proof to show that even if the encodings of X s possible values must be in a certain lexicographic order, then we can still achieve expected message length less than H(P) + 2.